Log Table - Determining Old Value vs New Value

Question

Post reply

Log Table - Determining Old Value vs New Value

mitzyturbo

Hall of Fame

Points: 3452
More actions
November 1, 2016 at 6:12 am

#322522

Hi guys
Wondering if someone can help me with performance tuning a query I've written.
My query
1. pulls a dataset from a log table into a temp table
2. applies a row number for each record set to determine the order that the data arrived
3. uses a while loop to iterate through each field that is different from the next row (this includes a nested while loop as there could be more than one difference within each row)
4. using dynamic sql, gets the value of each field that was different from the previous and stores these values in another temp table
I'm just wondering if I'm missing a much simpler way to run this using windowing functions as opposed to the dynamic sql. This code was working fine for small datasets but it's currently crawling when it's hitting anything over 30k records.
The only other option I can think of is to change the format of the log table to store the data as Old Value and New Value for each of the fields we want to check on (5 fields altogether).

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148786 More actions · Answer 1

Are you familiar with the LAG and LEAD functions? I think you could use them to avoid the loops.

John

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 2

First I've heard of them john, any good articles you'd recommend or would a google suffice?

John Mitchell-245523 SSC Guru Points: 148786 More actions · Answer 3

The examples in Books Online are a good place to start. Here's an illustration (not tested) that I cobbled together. You might also want to bear in mind that storing the old values will double your disk requirements and probably affect performance as well. You can get the old value from the previous row in the table. Write a view if you need to display old and new values side by side.

WITH OldandNew AS (

SELECT

RowNo AS RowNumber

,LAG(RowNo,1) OVER (ORDER BY RowNo) AS PreviousRowNumber

,Col1 AS Newcol1

,LAG(col1,1) OVER (ORDER BY RowNo) AS OldCol1

,Col2 AS Newcol2

,LAG(col2,1) OVER (ORDER BY RowNo) AS OldCol2

FROM MyTable

)

SELECT

RowNumber

,PreviousRowNumber

,Newcol1

,OldCol1

,Newcol2

,OldCol2

FROM OldandNew

WHERE NewCol1 <> OldCol1

OR NewCol2 <> OldCol2

John

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

1) Get a copy of Itzik Ben-Gan's Windowing Function book.

2) Fabiano Amorim did a great blog series on Simpletalk.com on WFs

3) You shouldn't need to put the data into a temp table. Huge waste of effort. Just do your work on the actual table, which has to have either an identity or a datetime added (or both) that you can use to process the records in proper order.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 5

Thanks guys for the replies

John - I got it working using the method you outlined, I used LEAD instead of LAG

Massive improvement on the performance too

Now I just need to check the data....oh the joys

SQLGuru - Great minds think alike, you were the second person to suggest Itzik Ben-Gan's stuff on this, I hadn't done my research but he looks like the authority on all such things

Phil Parkin SSC Guru Points: 245813 More actions · Answer 6

mitzyturbo (11/1/2016)
Thanks guys for the replies
John - I got it working using the method you outlined, I used LEAD instead of LAG
Massive improvement on the performance too
Now I just need to check the data....oh the joys
SQLGuru - Great minds think alike, you were the second person to suggest Itzik Ben-Gan's stuff on this, I hadn't done my research but he looks like the authority on all such things

This part of John's suggested code does not handle NULLs:

WHERE NewCol1 <> OldCol1

OR NewCol2 <> OldCol2

If your data may contain NULLs, I suggest that you consider using this construct instead, which does:

WHERE EXISTS

(

select NewCol1, NewCol2

except

select OldCol1, OldCol2

)

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

ScottPletcher SSC Guru Points: 100262 More actions · Answer 7

Cluster the log table on the relevant arrival datetime that you are comparing by (rather than, presumably, a meaningless identity value). Then query directly from the table, joining to itself using the clustering key to do a seek to find the row to join to.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 8

Thanks Phil

I've tweaked the code slightly so that there are no NULL values as I was thinking the exact same thing.

I'm currently working on getting rid of the temp tables altogether and wrapping it all into one select.

Unfortunately the original developers of the database, in their wisdom, split one entity into two different log tables so I need to join / merge my log data set and apply a row number per modified date

For now, using the temp tables isn't too bad, 60k rows being returned in less than 20 seconds which is a huge improvement

ScottPletcher SSC Guru Points: 100262 More actions · Answer 9

Properly clustering should still allow a merge join, which should be much faster than anything you're having to do now.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 10

As I have 2 log tables to pull data from, how would this work?

do you mean by just having a clustered index by the modified date on both log tables that the select will still run quicker?

ScottPletcher SSC Guru Points: 100262 More actions · Answer 11

mitzyturbo (11/1/2016)
As I have 2 log tables to pull data from, how would this work?
do you mean by just having a clustered index by the modified date on both log tables that the select will still run quicker?

Yes, exactly. I would think the "date added" would be sequential on each table, so fragmentation would not be an issue. [But some minor fragmentation is not that big a deal anyway.] Be sure to review the fillfactor on the table also, and make sure it is at least 98 or 99, assuming since this is a history table that you don't modify rows.

But by being clustered on the date, SQL would "know" the rows were in order, and it could do a very efficient join on the two tables, including to a prior row, if needed.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!