Help writing a SQL

  • My problem with the quirky update solution is the requirement to add a column to the original table, which may not be practical in reality.

    I've got an alternative that does without this (still needs the clustered index though) and returns the results in a varchar formatted as xml.

    This could be returned to the client as is, or used to join back to the original data, depending on SQL version or what mood you're in.

    create clustered index ix_acct_ndt on multrecs (acct, ndt)

    declare @lastDate smalldatetime, @xml varchar(1000)

    select @lastDate = '1900-01-01', @xml = ''

    select

    @lastDate = case when datediff(day, @lastDate, ndt) >= 38 or @lastDate = '1900-01-01' then ndt else @lastDate end,

    @xml = @xml + case when @lastDate = ndt then '' else '' end

    from multrecs

    select @xml

  • Richard Fryar (9/18/2009)


    My problem with the quirky update solution is the requirement to add a column to the original table, which may not be practical in reality.

    I've got an alternative that does without this (still needs the clustered index though) and returns the results in a varchar formatted as xml.

    This could be returned to the client as is, or used to join back to the original data, depending on SQL version or what mood you're in.

    create clustered index ix_acct_ndt on multrecs (acct, ndt)

    declare @lastDate smalldatetime, @xml varchar(1000)

    select @lastDate = '1900-01-01', @xml = ''

    select

    @lastDate = case when datediff(day, @lastDate, ndt) >= 38 or @lastDate = '1900-01-01' then ndt else @lastDate end,

    @xml = @xml + case when @lastDate = ndt then '' else '' end

    from multrecs

    select @xml

    In SQL 2005 +, that is an option, assuming you modify the code to meet the original requirements and then increase to nvarchar(max), but on large datasets it is going to be phenominally slow. Beyond that, you are still left with the possiblity of having to do something to that huge string that you generated afterwards.

    IIRC, this method performs better if you use size checks to update smaller variables and then write them to larger variables as they pass a certain size(I think I saw Barry do this once), but it still doesn't even come close to the speed of the update. Even selecting that entire 2 million row dataset into a temp table and running the update on the temp table only takes 19 seconds. I stopped your query(running on the 2 million row dataset and altered to also use acct) after 3 minutes on my workstation. (As I'm headed out the door and unfortunately don't have time to see how long it would have actually taken)

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (9/18/2009)


    Jeff Moden (9/18/2009)


    Turn off the execution plan and see what you get.

    Without the execution plan, 12 seconds for quirky with MAXDOP 1, 151 for the RCTE. I ran them a few times each with similar results. I also ran them without stopping/starting the server, which actually made almost no difference, probably because this is a workstation with only like 2 gigs of memory.

    Heh... I was trying to give the CTE a chance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Garadin (9/18/2009)


    ...but it still doesn't even come close to the speed of the update. Even selecting that entire 2 million row dataset into a temp table and running the update on the temp table only takes 19 seconds. I stopped your query(running on the 2 million row dataset and altered to also use acct) after 3 minutes on my workstation.

    Heh... You beat me to it...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Interesting. Thanks for testing it...

    I only tested against a few rows and noticed that logical I/Os were lower than the other two options. I suspect the performance hit is due to the repeated string concatenations.

    This is only designed for small data sets anyway as this is a SQL 2000 forum so you can't use varchar(max).

    I'll think about it over the weekend. I'm sure there must be a way of speeding this up.

  • Richard Fryar (9/19/2009)


    This is only designed for small data sets anyway...

    I'm not sure how to say this without someone taking offense, so please take no offense. I know of dozen's of instances where people have adopted a methodology based on the "fact" that there would only be "small data sets" and have been absolutely burned at the stake because of that poor assumption and business requirements that change in the future. You can certainly do what you want based on your supposed business requirements, but my recommendation is to always make scalable code. When I say "scalable", I mean to a million rows or more.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Garadin (9/18/2009)


    And the results are in. It probably wasn't a fair contest to begin with. You were going up against a gut accustomed to eating 2 day old Taco Bell burritos without flinching.

    Quirky Method No DOP set, Messages:

    ...

    Notes:

    1. I simplified your test data script a bit. I created 2 Million rows of test data.

    ...

    Test Data Setup

    ]IF OBJECT_ID('multrecs','u') IS NOT NULL DROP TABLE multrecs

    CREATE TABLE multrecs(acct int, ndt datetime)

    ...

    ALTER TABLE multrecs ADD display bit

    CREATE CLUSTERED INDEX Quirky ON multrecs (acct, ndt)

    ...

    Thanks for testing this Seth and for all you work on it. However, I am concerned that your test setup is very slanted to the benefit of the Quirky Update and to the detriment of every other possible solution.

    In particular, it's not really cricket for comparative testing purposes to alter the source table's definition to A) add exactly the Clustered Index that the Quirky Update needs to work correctly, and B) add in the housekeeping columns that is also needs to work its magic.

    Typically, the source data table definition is not in-play for these kinds of problems as they are usually "post reporting" kinds of tasks, and things like the primary key, clustered index, etc. are set for its normal production uses. Thus generally, the only messing about with the source table that is fair play is (maybe) adding a non-clustered index, since you can have more than one of them. But for comparative tests, anything else that a particular method needs done to the source data, is on it to do itself.

    So unless the Quirky Update gets lucky, it should still be making its own temp table, as Greg originally wrote it.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (9/19/2009)


    Garadin (9/18/2009)


    And the results are in. It probably wasn't a fair contest to begin with. You were going up against a gut accustomed to eating 2 day old Taco Bell burritos without flinching.

    Quirky Method No DOP set, Messages:

    ...

    Notes:

    1. I simplified your test data script a bit. I created 2 Million rows of test data.

    ...

    Test Data Setup

    ]IF OBJECT_ID('multrecs','u') IS NOT NULL DROP TABLE multrecs

    CREATE TABLE multrecs(acct int, ndt datetime)

    ...

    ALTER TABLE multrecs ADD display bit

    CREATE CLUSTERED INDEX Quirky ON multrecs (acct, ndt)

    ...

    Thanks for testing this Seth and for all you work on it. However, I am concerned that your test setup is very slanted to the benefit of the Quirky Update and to the detriment of every other possible solution.

    In particular, it's not really cricket for comparative testing purposes to alter the source table's definition to A) add exactly the Clustered Index that the Quirky Update needs to work correctly, and B) add in the housekeeping columns that is also needs to work its magic.

    Typically, the source data table definition is not in-play for these kinds of problems as they are usually "post reporting" kinds of tasks, and things like the primary key, clustered index, etc. are set for its normal production uses. Thus generally, the only messing about with the source table that is fair play is (maybe) adding a non-clustered index, since you can have more than one of them. But for comparative tests, anything else that a particular method needs done to the source data, is on it to do itself.

    So unless the Quirky Update gets lucky, it should still be making its own temp table, as Greg originally wrote it.

    Thanks for the feedback Barry. While I agree that not using a temp table might be biased towards the quirky update, I don't believe that it has unfairly put any other solution behind. The index was made with a bad name, which I stated in my original testing post. The name of the index is all that was biased though; as that same index has been required for every one of the solutions presented thus far. Had it not been, I definitely would have changed it in the interest of an even playing field. Therefore, each solution has used the index on the original table that it has wanted.

    The quirky update *did* add a field to the original table, but I also supplied a time for the selection of all of the data into a temp table. Even using that time (19 seconds), no other solution has come remotely close. The only thing I can think of that might further slow down the quirky update is to remove the clustered index all together on the main table and only create it on the temp table. I still don't think this will slow it down enough to be behind any of the others, but I will test it.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Man, this is fun to watch... it's a real pleasure to see someone else willing to take on all comers in favor of the "Quirky Update".

    Seth, keep it up man... you're doing great. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Barry, I briefly tested dropping the index all together before inserting data into a temp table, but I'm not sure if that's fair since the index did in fact exist at some point. I'll completely re-run the creation script on Monday and never create the index on that table to be fair.

    To be a non-biased test though, do you agree that re-running all of the methods with completely unindexed data is what the next step should be?

    Also, since you're in this thread now, can you remind me if my memory was faulty earlier when I tentatively cited you as the person who created a similar script to Richard's that employed some advanced data manipulation techniques with the dumping into increasingly larger variables and (if memory serves) some sort of varbinary conversion? Sorry to be so vague, I "kinda" remember it, but not well enough to re-create it to apply here.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (9/20/2009)


    Barry, I briefly tested dropping the index all together before inserting data into a temp table, but I'm not sure if that's fair since the index did in fact exist at some point. I'll completely re-run the creation script on Monday and never create the index on that table to be fair.

    To be a non-biased test though, do you agree that re-running all of the methods with completely unindexed data is what the next step should be?

    Yes, that seems fair. My point is less about Quirky Updates or even this particular challenge, and more about test protocols in general. Simplifying conditions for a test is fine, but it's important to ensure that the assumptions made are either reasonably likely or else not unfairly beneficial to any particular approach. As I indicated before, non-clustered indexes are usually reasonable, and perhaps even the addition of a (small) column, so long as all possible solutions get to specify them also (though this gets to be a problem with test management).

    Also, since you're in this thread now, can you remind me if my memory was faulty earlier when I tentatively cited you as the person who created a similar script to Richard's that employed some advanced data manipulation techniques with the dumping into increasingly larger variables and (if memory serves) some sort of varbinary conversion? Sorry to be so vague, I "kinda" remember it, but not well enough to re-create it to apply here.

    Yep, that was me, but it's an extraordinarily complex form of string concatenation that still loses to FOR XML on the high side. Still, I cannot see how FOR XML could be easily applied here, so maybe I will give it a shot. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Ok, so I had to give it a go. I went ahead and tried to use what little I remembered (basically only that varbinary was involved and you used 'pools') to try to recreate this. I'm stuck at this point because my varbinary isn't converting back correctly, but I got closer than I thought I would.

    I just went back and found the original thread (The search on this site is super powerful, I found it in about 10 second searching for a post by you and keyword varbinary). The rownumber seems like a better idea than my identity temp table, and I'm sure this isn't as efficient as yours, but I figured I'd throw it out here anyways . A disclaimer lest someone misunderstand: This is just me attempting to recreate something Barry did (and failing :hehe:). It actually goes through all the data pretty quickly (about the same speed as the quirky update, not counting the temp table), but since it doesn't actually *work*, I can't say that it's a challenging "solution".

    CREATE TABLE MR(

    MRIDint IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    acctint,

    ndtdatetime)

    INSERT INTO MR(acct, ndt)

    SELECT acct, ndt

    FROM multrecs

    ORDER BY acct, ndt

    DECLARE @vb128varbinary(128),

    @vb1024varbinary(1024),

    @vb8000varbinary(8000),

    @vbmaxvarbinary(MAX),

    @LDNDTdatetime,

    @Foundint,

    @Acctint,

    @RRint,

    @vcmvarchar(MAX),

    @vbcommavarbinary(1),

    @vbblankvarbinary(1)

    SELECT @vb128 = CAST('' AS varbinary(128)),

    @vb1024 = CAST('' AS varbinary(1024)),

    @vb8000 = CAST('' AS varbinary(8000)),

    @vbmax = CAST('' AS varbinary(max)),

    @vbblank = CAST('' AS varbinary(1)),

    @vbcomma = CAST(',' AS varbinary(1)),

    @Found = 0,

    @LDNDT = '1/1/1900',

    @Acct = 0,

    @RR = 0

    SELECT @Found = CASEWHEN Acct = @Acct AND DATEDIFF(d,@LDNDT,NDT) < 150 THEN @Found ELSE @Found + 1 END,

    @vbmax = CASEWHEN Acct = @Acct AND DATEDIFF(d,@LDNDT,NDT) < 150 THEN

    CASEWHEN @Found%999 = 0 THEN @vbmax + @vb8000 + CAST(MRID AS varbinary(7)) + @vbcomma

    END ELSE @vbmax END,

    @vb8000= CASEWHEN Acct = @Acct AND DATEDIFF(d,@LDNDT,NDT) < 150 THEN

    CASEWHEN @Found%127 = 0 THEN @vb8000 + @vb1024 + CAST(MRID AS varbinary(7)) + @vbcomma

    WHEN @Found%999 = 0 THEN @vbblank

    END ELSE @vb8000 END,

    @vb1024 = CASEWHEN Acct = @acct AND DATEDIFF(d,@LDNDT,NDT) < 150 THEN

    CASE WHEN @Found%31 = 0 THEN @vb1024 + @vb128 + CAST(MRID AS varbinary(7)) + @vbcomma

    WHEN @Found%127 =0 THEN @vbblank

    END ELSE @vb1024 END,

    @vb128 = CASEWHEN Acct = @Acct AND DATEDIFF(d,@LDNDT,NDT) 0 THEN @vb128 + CAST(MRID AS varbinary(7)) + @vbcomma

    ELSE @vbblank

    ENDELSE @vb128 END,

    @LDNDT = CASEWHEN Acct = @Acct AND DATEDIFF(d,@LDNDT,NDT) < 150 THEN @LDNDT ELSE NDT END,

    @Acct = Acct,

    @RR = @RR + 1

    FROM MR

    -- This statement runs in ~12 seconds, which is comparable to the quirky update.

    --SELECT @VBMax VBMax, @VB8000 VB8000, @VB1024 VB1024, @VB128 VB128

    SET @VBMax = ISNULL(@VBMax,@vbblank) + ISNULL(@VB8000,@vbblank) + ISNULL(@vb1024,@vbblank) + ISNULL(@vb128,@vbblank)

    --SELECT @vbmax

    SET @vcm = CAST(@vbmax AS varchar(MAX))

    SET @vcm = LEFT(@vcm,LEN(@vcm)-1) -- remove trailing comma

    --SELECT @VCM

    SET @vcm = 'SELECT acct, ndt FROM MR WHERE MRID IN (' + @vcm + ')'

    PRINT @VCM

    EXEC(@vcm) -- This fails with a syntax error near , because the varbinary doesn't convert back to integers properly

    [Edit] I know there are a lot of errors in this atm, including 31 being too many repetitions to fit into vb128, that was the product of me changing the format of this a dozen times and not going back to fix it. I'm missing something simpler about varbinary though, as even much simpler conversions done in this manner don't convert back properly. I ran out of time to mess with this, so figured I'd throw it out there until I have time to mess with it again.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Heh. Ok, Seth enough, I give. You have succeeded in shaming me into trying to do this. :blush:

    Gimme an hour or so ...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (9/22/2009)


    Heh. Ok, Seth enough, I give. You have succeeded in shaming me into trying to do this. :blush:

    Gimme an hour or so ...

    No shame intended sir. The solution had been mentioned numerous times, and it's a method I wanted to grow more familiar with anyways (Looking back at the original thread, I was very puzzled about it then as well). My intention in posting it here was merely to toss it out there in case I had stumbled across anything new that you could apply to it if/when you decided to give it a shot. I also kinda wanted to try to conceptualize it myself before looking back at the original post or seeing your upcoming solution to see if I could figure it out, as once I've figured something out "on my own", I tend to understand it/remember it a lot better.

    That said, I look forward to seeing it done right ;-).

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • *whew* OK, now I remember why I quit fooling around with this tricky technique: 1) it's really tricky and hard to work with. Every time I try to adjust its "scaling" up or down, there about a dozen different places where I can overlook something or make a mistake and produce the wrong results, and 2) clever as it is, it still just cannot beat true linear methods like pseudo-cursors or (God forbid) even Cursors in a linear algorithm.

    Anyhow, here it is, note it probably still needs that Clustered Key thing for optimal performance (don't say it, Seth :-P), but it is not setting it up itself, so if you took that off, you may need to edit it back in.

    --create clustered index ix_acct_ndt on multrecs (acct, ndt)

    declare @xml varchar(MAX), @acct int, @nextdate DATETIME, @datstr as varchar(80)

    select @xml = '', @acct = -1, @nextdate = '1900-01-01', @datstr = ''

    Declare @i int, @z int, @rowstate int

    Declare @S16 Varchar(MAX), @S14 Varchar(MAX), @S12 Varchar(MAX), @s10 Varchar(MAX)

    Select @i=0, @S16='', @S14='', @S12='', @s10=''

    select

    @z = ROW_NUMBER() OVER(ORDER BY acct,ndt),

    @rowstate = CASE WHEN @acct acct THEN 2 WHEN ndt >= @nextdate THEN 1 ELSE 0 END,

    @nextdate = CASE @rowstate WHEN 0 THEN @nextdate ELSE DATEADD(day, 38, ndt) END,

    @i = CASE @rowstate WHEN 0 THEN @i ELSE @i+1 END,

    @datstr = CASE @rowstate WHEN 0 THEN '' ELSE '' END,

    @S16 = CASE WHEN @rowstate=0 THEN @S16 WHEN @i%65536=0 THEN @S16+@S14+@S12+@S10+@xml+@datstr Else @S16 END,

    @S14 = CASE WHEN @rowstate=0 THEN @S14 WHEN @i%65536=0 THEN '' WHEN @i%16384=0 THEN @S14+@S12+@S10+@xml+@datstr Else @S14 END,

    @S12 = CASE WHEN @rowstate=0 THEN @S12 WHEN @i%16384=0 Then '' WHEN @i%4096=0 THEN @S12+@S10+@xml+@datstr Else @S12 END,

    @s10 = CASE WHEN @rowstate=0 THEN @s10 WHEN @i%4096=0 Then '' WHEN @i%1024=0 THEN @s10+@xml+@datstr Else @s10 END,

    @xml = CASE WHEN @rowstate=0 THEN @xml WHEN @i%1024=0 THEN '' ELSE @xml+@datstr END,

    @acct = acct

    from multrecs

    ORDER BY acct,ndt

    SELECT @xml = @S16+@S14+@S12+@S10+@xml

    Declare @x as XML

    select @x = ''+@xml+''

    SELECT acct, ndt38

    FROM (

    SELECT

    T.c.value('@a[1]', 'int') AS [acct],

    T.c.value('@t[1]', 'varchar(24)') AS [ndt38]

    FROM @x.nodes('/d/r') AS T(c)

    ) T

    where RIGHT(ndt38,1)='2'

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 31 through 45 (of 70 total)

You must be logged in to reply to this topic. Login to reply