Identifying Current Record

  • I have a snapshot table of about 15 million records in the form of:

    InvoiceIDLineItemIDSnapshotDateAmount

    1 1 20140101 12

    1 2 20140102 14

    1 3 20140103 17

    2 1 20140101 10

    2 2 20140102 5

    1 2 20140105 15

    1 3 20140105 20

    I want to create an additional column called Current as shown below:

    InvoiceIDLineItemIDSnapshotDateAmount Current

    1 1 20140101 12 1

    1 2 20140102 14 0

    1 3 20140103 17 0

    2 1 20140101 10 1

    2 2 20140102 5 1

    1 2 20140105 15 1

    1 3 20140105 20 1

    How can we write a query to achieve this while keeping in mind:

    - We do not want to do unnecessary record lookups and Updates

    - We only update records that corresponds to new entries. For example, we should not touch the record for InvoiceID = 2 in the above example

    Thanks

  • don't try to do it in the table. create a view or query that uses row_number() OVER (Partition by...order by..)

    if you try and do it in the table itself, you will run into a huge performance hit in your trigger, which has to rearrange data with any insert update or delete.

    so just don't do that.

    use that as a sub query, and whatever has the lowest/highest number gets the 'Current'd description as a case statement.

    wild guess:

    SELECT CASE WHEN RW = 1 THEN 'Current' ELSE '' END AS isCurrentIndicator,

    *

    FROM (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY InvoiceID ORDER BY InvoiceID,LineItemID) AS RW,

    InvoiceID,

    LineItemID,

    SnapshotDate,

    Amount

    FROM YourTable) SubQueryAlias

    ORDER BY InvoiceID,LineItemID

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I'd definitely add a bit column to the table. Your data and your statements seem contradictory, but I think when a new row is added that becomes current, you could just update the old row to turn off the current flag. Edit: the appropriate index should make this update relatively painless.

    No way I personally would use ROW_NUMBER() or any other such mechanism, since I wouldn't want to risk sorting 15M+ rows, or any significant part thereof, every time I went to read them.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • I also don't like to use Partitions and Ranks etc due to the sorting overhead

    The default value for the new column will be set to 1 thus will allow new records to become active

    But what is the best way to detect the old records. I believe I still have to go through all the records via Snapshot_Date. Still can't figure out the best way to extract older records

  • rka (3/6/2014)


    But what is the best way to detect the old records. I believe I still have to go through all the records via Snapshot_Date. Still can't figure out the best way to extract older records

    I have no idea. You haven't given any specific rules on how to determine what is current and what no longer is.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • I'd take this as a two phase approach.

    First, you need to get your current data into the format you want. No matter what you do, you're going to do a massive table update at least once. I recommend you port it to a new table and swap the table objects if you have the space.

    To find the correct current values, you need to use the ROW_NUMBER() technique described by Lowell to activate the most recent invoice/line item combination.

    Then, you need to include an INSERT trigger for further updates. Said trigger needs to make sure the same invoice/lineitem is distinct in each inclusion. UPDATE the table to turn off any existing invoiceID/lineitem items with current = 1 (as Scott mentioned, use an index to easily locate these) that exist before allowing the trigger to finish the insert.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • You might want to take a look at this article I wrote in SimpleTalk. It goes through how to retrieve versioned data[/url] from the table. Assuming you can identify the latest record based on that date column, the techniques outlined here should work for you, and should perform well (we had millions and millions of records and performance was great) as long as you build out your indexes appropriately.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply