Need help with max value in date column

  • pmadhavapeddi22 (1/7/2014)


    jeff,

    I tried the below solution, will it effect the peformance ?

    select

    employee.employee_id ,date_joined ,aa.date_ , ROUND((sal+due_amount),2)

    from employee

    inner join (select employee_id,date_joined,MAX(date_) 'date_' from employee group by employee_id, date_joined)aa

    on aa.date_=employee.date_ and aa.employee_id =employee.employee_id

    Correct me if this approach is not right

    The truth be told, I don't know where my head was at when I wrote the ROW_NUMBER() code for this. :blush: Although your code will cause two scans with or without indexes, I believe that your code will be quite a bit faster because your code won't have to do a sort.

    Lemme do what I should have done in the first place and build a million row test table and we'll find out if I'm right about that or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Another way (similar to Christian's):

    WITH PreAggregate AS

    (

    SELECT employee_id, date_joined, date_=MAX(date_)

    FROM #employee

    GROUP BY employee_id, date_joined

    )

    SELECT employee_id, date_joined, date_, SomeCalculatedColumn

    FROM PreAggregate a

    CROSS APPLY

    (

    SELECT ROUND(sal+due_amount, 2)

    FROM #employee b

    WHERE a.employee_id = b.employee_id AND a.date_joined = b.date_joined

    AND a.date_ = b.date_

    ) b (SomeCalculatedColumn);


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Ya just gotta love the "old ways" sometimes. 🙂 Before we get into that, let's build a bit more test data. This first time around, we'll do it without indexes because I love solutions that work well without them. DO notice that I added some NOT NULLs in anticipation of adding the expected Clustered PK.

    -- DROP TABLE #employee

    GO

    CREATE TABLE #employee

    (

    employee_id INT NOT NULL

    ,date_joined DATETIME NOT NULL

    ,date_ DATETIME NOT NULL

    ,sal FLOAT NOT NULL

    ,due_amount FLOAT NOT NULL

    )

    ;

    INSERT INTO #employee

    (employee_id, date_joined, date_, sal, due_amount )

    SELECT TOP 1000000

    employee_id = ABS(CHECKSUM(NEWID()))%1000+1

    ,date_joined = ABS(CHECKSUM(NEWID()))%DATEDIFF(dd,'2013','2014')+CONVERT(DATETIME,'2013')

    ,date_ = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2013','2014')+CONVERT(DATETIME,'2013')

    ,sal = RAND(CHECKSUM(NEWID()))*10

    ,due_amount = RAND(CHECKSUM(NEWID()))*10

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    Here's the test code of the current 3 solutions that return the correct answer. Notice that I modified pmadhavapeddi22's solution a tiny bit to make it much less likely that it will return duplicated rows.

    PRINT '===== pmadhavapeddi22''s "Old Fashioned" Derived Table (subquery) ====';

    SET STATISTICS TIME,IO ON;

    SELECT e.employee_id, e.date_joined, e.date_, ROUND((e.sal+e.due_amount),2)

    FROM #employee e

    INNER JOIN (SELECT employee_id,date_joined,MAX(date_) 'date_'

    FROM #employee

    GROUP BY employee_id,date_joined) d

    ON d.employee_id = e.employee_id

    AND d.date_joined = e.date_joined

    AND d.date_ = e.date_;

    SET STATISTICS TIME,IO OFF;

    GO

    PRINT '===== ROW_NUMBER/CTE =================================================';

    SET STATISTICS TIME,IO ON;

    WITH

    ctePreSort AS

    (

    SELECT RowNum = ROW_NUMBER()OVER(PARTITION BY employee_id, date_joined ORDER BY date_ DESC)

    ,employee_id

    ,date_joined

    ,date_

    ,SomeCalculatedColumn = ROUND((sal+due_amount),2)

    FROM #employee

    )

    SELECT employee_id

    ,date_joined

    ,date_

    ,SomeCalculatedColumn

    FROM ctePreSort

    WHERE RowNum = 1;

    SET STATISTICS TIME,IO OFF;

    GO

    PRINT '===== Dwain''s PreAgg/Cross Apply ====================================';

    SET STATISTICS TIME,IO ON;

    WITH PreAggregate AS

    (

    SELECT employee_id, date_joined, date_=MAX(date_)

    FROM #employee

    GROUP BY employee_id, date_joined

    )

    SELECT employee_id, date_joined, date_, SomeCalculatedColumn

    FROM PreAggregate a

    CROSS APPLY

    (

    SELECT ROUND(sal+due_amount, 2)

    FROM #employee b

    WHERE a.employee_id = b.employee_id AND a.date_joined = b.date_joined

    AND a.date_ = b.date_

    ) b (SomeCalculatedColumn);

    SET STATISTICS TIME,IO OFF;

    Here are the run results without any indexes.

    ===== pmadhavapeddi22's "Old Fashioned" Derived Table (subquery) ====

    (341313 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#employee___________________________________________________________________________________________________________000000000005'.

    Scan count 2, logical reads 11174, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 6265 ms, elapsed time = 28035 ms.

    ===== ROW_NUMBER/CTE =================================================

    (341313 row(s) affected)

    Table '#employee___________________________________________________________________________________________________________000000000005'.

    Scan count 1, logical reads 5587, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 9235 ms, elapsed time = 55370 ms.

    ===== Dwain's PreAgg/Cross Apply ====================================

    (341313 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#employee___________________________________________________________________________________________________________000000000005'.

    Scan count 2, logical reads 11174, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 6344 ms, elapsed time = 30021 ms.

    pmadhavapeddi22's "Old Fashioned" Derived Table (subquery) beat the other two for CPU usage and duration. It did use twice as many reads as the ROW_NUMBER() method (as did Dwain's) because of the double-dip on the table, but it used 32% less CPU and returned the result set in half the time (as did Dwain's). The place where it really won big (IMHO) is its simplicity compared to the other methods.

    If we add the following Clustered index in the form of a PK (I used the IGNORE_DUP_KEY = ON setting just to make testing with bulk random data easier. It shouldn't be there in real life)...

    ALTER TABLE #employee

    ADD CONSTRAINT PK_Employee PRIMARY KEY CLUSTERED

    (

    employee_id ASC,

    date_joined ASC,

    date_ ASC

    ) WITH (IGNORE_DUP_KEY = ON)

    ;

    ... we get the following run results...

    ===== pmadhavapeddi22's "Old Fashioned" Derived Table (subquery) ====

    (341228 row(s) affected)

    Table '#employee___________________________________________________________________________________________________________000000000006'.

    Scan count 1, logical reads 5610, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2062 ms, elapsed time = 22976 ms.

    ===== ROW_NUMBER/CTE =================================================

    (341228 row(s) affected)

    Table '#employee___________________________________________________________________________________________________________000000000006'.

    Scan count 1, logical reads 5609, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 5797 ms, elapsed time = 51287 ms.

    ===== Dwain's PreAgg/Cross Apply ====================================

    (341228 row(s) affected)

    Table '#employee___________________________________________________________________________________________________________000000000006'.

    Scan count 1, logical reads 5610, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2109 ms, elapsed time = 21993 ms.

    The index helped all 3 CPU wise but didn't do much for duration. That's because returning stuff to the screen is the "great equalizer". The BIG thing to notice is that it cut the number of reads for the "Old Fashioned" method and Dwain's method in half making them the same as the ROW_NUMBER() method. Considering that the other two both worked quite well without indexes and use less than half the CPU time with indexes, I'm going to step out and say the ROW_NUMBER() method is the loser on all fronts.

    Considering that the "Old Fashioned" method is a bit simpler to understand than the others as well as beating the others for performance, I'm also going to step out and say that's the winning method. Heh... "old stuff" is good. 😛

    [font="Arial Black"]Well done, pmadhavapeddi22 [/font]and thanks for the reminder that "new" isn't always better.

    As a bit of a sidebar, if the real "Employee" table is built as suggested by the originally posted code, it contains an SCD Type 1 (Slowly Changing Dimension Type 1) and that requires the composite PK, which is a bummer. I rather store the "history" in another table so that I could based the PK only on the Employee_ID column (no matter how it is originally formed) especially since the composite PK is FK resistant.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Always the bridesmaid and never the bride.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thank you Jeff 🙂

  • dwain.c (1/8/2014)


    Always the bridesmaid and never the bride.

    Heh... Like Granny used to say, "Kissin' don't last... cookin' do". 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply