ROW_NUMBER() OVER (PARTITION BY *** ORDER BY ***) excluding NULL values

  • yubo1 (9/29/2011)


    yubo1 (9/29/2011)


    SELECT * ,

    ROW_NUMBER() OVER ( ORDER BY ID ) RowNum

    FROM #test where colC is not null

    union

    SELECT * ,

    null as RoeNum

    FROM #test where colC is null

    Is there anything wrong by use '*' instead of 'ID,colc'?

    table #test have not any other columns ...

    Well, maybe not with a temporary test table that the OP has given as an example, but when this is translated back to production code, it then becomes very bad practice indeed.

    Similarly with UNION vs UNION ALL - there was no requirement for a de-duplication exercise to be done so it's forcing an expensive distinct sort when we can see logically we have two sets that cannot overlap.

    Set "Include Actual Execution Plan" to on in SSMS (under the Query menu) and run all of this, comparing the execution plans for the first and the second query to see the effect of this:

    --Forces a Distinct Sort - more expensive

    SELECT ID ,

    colC ,

    ROW_NUMBER() OVER ( ORDER BY ID ) RowNum

    FROM #test

    WHERE colC IS NOT NULL

    UNION

    SELECT ID ,

    colC ,

    NULL RowNum

    FROM #test

    WHERE colC IS NULL

    --Doesn't force a distinct sort - less expensive

    SELECT ID ,

    colC ,

    ROW_NUMBER() OVER ( ORDER BY ID ) RowNum

    FROM #test

    WHERE colC IS NOT NULL

    UNION ALL

    SELECT ID ,

    colC ,

    NULL RowNum

    FROM #test

    WHERE colC IS NULL

  • HowardW (9/29/2011)


    yubo1 (9/29/2011)


    yubo1 (9/29/2011)


    SELECT * ,

    ROW_NUMBER() OVER ( ORDER BY ID ) RowNum

    FROM #test where colC is not null

    union

    SELECT * ,

    null as RoeNum

    FROM #test where colC is null

    Is there anything wrong by use '*' instead of 'ID,colc'?

    table #test have not any other columns ...

    Well, maybe not with a temporary test table that the OP has given as an example, but when this is translated back to production code, it then becomes very bad practice indeed.

    Similarly with UNION vs UNION ALL - there was no requirement for a de-duplication exercise to be done so it's forcing an expensive distinct sort when we can see logically we have two sets that cannot overlap.

    Set "Include Actual Execution Plan" to on in SSMS (under the Query menu) and run all of this, comparing the execution plans for the first and the second query to see the effect of this:

    --Forces a Distinct Sort - more expensive

    SELECT ID ,

    colC ,

    ROW_NUMBER() OVER ( ORDER BY ID ) RowNum

    FROM #test

    WHERE colC IS NOT NULL

    UNION

    SELECT ID ,

    colC ,

    NULL RowNum

    FROM #test

    WHERE colC IS NULL

    --Doesn't force a distinct sort - less expensive

    SELECT ID ,

    colC ,

    ROW_NUMBER() OVER ( ORDER BY ID ) RowNum

    FROM #test

    WHERE colC IS NOT NULL

    UNION ALL

    SELECT ID ,

    colC ,

    NULL RowNum

    FROM #test

    WHERE colC IS NULL

    Thanks for your Comprehensive Explanation.

  • The problem with a UNION (ALL) is that you will have to scan the table twice. Here is an approach that only scans the table once.

    SELECT [ID]

    , ColC

    , CASE

    WHEN ColC IS NOT NULL

    THEN Row_Number() OVER( PARTITION BY CASE WHEN ColC IS NOT NULL THEN 1 END ORDER BY [ID] )

    END AS RowNum

    FROM #test

    ORDER BY ID

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Nice solution Drew. I guess "it depends" on quite a lot of factors as to which would perform better (2 x Table scan + partial sort vs 1 x Table scan + Full Sort).

    Especially once you get into which columns are indexed as the UNION ALL solution can be satisfied without a sort if for example ID is a clustered PK in real life, whereas the case cannot (without getting into the realms of indexed computed columns) and whether the WHERE clause can be satisfied with a seek rather than a scan.

  • Thanks Drew ... that was another "classy" solution 🙂

  • this should do it:

    IF OBJECT_ID('tempdb..#test') IS NOT NULL

    DROP TABLE #test

    GO

    CREATE TABLE #test (ID INT IDENTITY, colC INT ) ;

    INSERT INTO #test VALUES ( 111 ) ;

    INSERT INTO #test VALUES ( 222 ) ;

    INSERT INTO #test VALUES ( NULL ) ;

    INSERT INTO #test VALUES ( NULL ) ;

    INSERT INTO #test VALUES ( 555 ) ;

    INSERT INTO #test VALUES ( 666 ) ;

    INSERT INTO #test VALUES ( NULL ) ;

    INSERT INTO #test VALUES ( '777' ) ;

    SELECT * ,

    ROW_NUMBER() OVER ( ORDER BY ID ) RowNum

    FROM #test

    where colC is not null

    union

    SELECT * ,

    null RowNum

    FROM #test

    where colC is null

  • drew.allen (9/29/2011)


    The problem with a UNION (ALL) is that you will have to scan the table twice. Here is an approach that only scans the table once.

    SELECT [ID]

    , ColC

    , CASE

    WHEN ColC IS NOT NULL

    THEN Row_Number() OVER( PARTITION BY CASE WHEN ColC IS NOT NULL THEN 1 END ORDER BY [ID] )

    END AS RowNum

    FROM #test

    ORDER BY ID

    Drew

    I never thought you could use CASE inside the OVER clause like that. :w00t:

    It really shows how flexible CASE statements are.

    Best regards,

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • Using the last reply, i made this example for the same problem, with multiples keys.

    I send out null with a out range number, but isn''t the best solution to this. I'ts like still have null.

    Three ways to diferent responses.

    SELECT *,ROW_NUMBER() OVER(PARTITION BY T1.a ORDER BY T1.t ASC) AS COL1,

    CASE WHEN T1.t IS NULL THEN 99 ELSE ROW_NUMBER() OVER(PARTITION BY T1.a ORDER BY T1.t ASC) END AS COL2,

    CASE

    WHEN T1.t IS NOT NULL

    THEN Row_Number() OVER( PARTITION BY CASE WHEN T1.t IS NOT NULL THEN 1 END,T1.a ORDER BY T1.a )

    ELSE 99

    END AS COL3

    FROM

    (SELECT 1 as a, NULL as t

    UNION

    SELECT 1 as a, 2 as t

    UNION

    SELECT 1 as a, 1 as t

    UNION

    SELECT 1 as a, 7 as t

    UNION

    SELECT 1 as a, 5 as t

    UNION

    SELECT 2 as a, 2 as t

    UNION

    SELECT 2 as a, 1 as t

    UNION

    SELECT 2 as a, 7 as t

    UNION

    SELECT 2 as a, NULL as t)

    AS T1

    order by a ASC

    And response is here

    a t COL1 COL2 COL3

    ----------- ----------- -------------------- -------------------- --------------------

    1 1 2 2 1

    1 2 3 3 2

    1 5 4 4 3

    1 7 5 5 4

    1 NULL 1 99 99

    2 NULL 1 99 99

    2 1 2 2 1

    2 2 3 3 2

    2 7 4 4 3

  • With a given data type of your ColC column, you can do the same without any CASE WHEN:

    SELECT [ID]

    , ColC

    , ColC-ColC+ROW_NUMBER() OVER( PARTITION BY ColC-ColC ORDER BY [ID] )

    FROM #test

    ORDER BY ID

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • "CASE WHEN"-less query also possible for ColC of string type:

    SELECT [ID]

    , ColC

    , LEN(ColC)-LEN(ColC)+ROW_NUMBER() OVER(PARTITION BY NULLIF(ISNULL(colC,''), colC) ORDER BY [ID])

    FROM #test1

    ORDER BY ID

    Yep, a bit of cheat here as NULLIF translates to CASE WHEN...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • loco.cansado (1/14/2015)


    Using the last reply, i made this example for the same problem, with multiples keys.

    I send out null with a out range number, but isn''t the best solution to this. I'ts like still have null.

    Three ways to diferent responses.

    SELECT *,ROW_NUMBER() OVER(PARTITION BY T1.a ORDER BY T1.t ASC) AS COL1,

    CASE WHEN T1.t IS NULL THEN 99 ELSE ROW_NUMBER() OVER(PARTITION BY T1.a ORDER BY T1.t ASC) END AS COL2,

    CASE

    WHEN T1.t IS NOT NULL

    THEN Row_Number() OVER( PARTITION BY CASE WHEN T1.t IS NOT NULL THEN 1 END,T1.a ORDER BY T1.a )

    ELSE 99

    END AS COL3

    FROM

    (SELECT 1 as a, NULL as t

    UNION

    SELECT 1 as a, 2 as t

    UNION

    SELECT 1 as a, 1 as t

    UNION

    SELECT 1 as a, 7 as t

    UNION

    SELECT 1 as a, 5 as t

    UNION

    SELECT 2 as a, 2 as t

    UNION

    SELECT 2 as a, 1 as t

    UNION

    SELECT 2 as a, 7 as t

    UNION

    SELECT 2 as a, NULL as t)

    AS T1

    order by a ASC

    And response is here

    a t COL1 COL2 COL3

    ----------- ----------- -------------------- -------------------- --------------------

    1 1 2 2 1

    1 2 3 3 2

    1 5 4 4 3

    1 7 5 5 4

    1 NULL 1 99 99

    2 NULL 1 99 99

    2 1 2 2 1

    2 2 3 3 2

    2 7 4 4 3

    USING

    SELECT *,ROW_NUMBER() OVER(PARTITION BY T1.a ORDER BY T1.t ASC) AS COL1,

    CASE WHEN T1.t IS NULL THEN 99 ELSE ROW_NUMBER() OVER(PARTITION BY T1.a ORDER BY T1.t ASC) END AS COL2,

    Row_Number() OVER( PARTITION BY T1.a ORDER BY CASE WHEN T1.t IS NULL THEN 2147483647 END )

    AS COL3

    FROM

    (SELECT 1 as a, NULL as t

    UNION

    SELECT 1 as a, 2 as t

    UNION

    SELECT 1 as a, 1 as t

    UNION

    SELECT 1 as a, 7 as t

    UNION

    SELECT 1 as a, 5 as t

    UNION

    SELECT 2 as a, 2 as t

    UNION

    SELECT 2 as a, 1 as t

    UNION

    SELECT 2 as a, 7 as t

    UNION

    SELECT 2 as a, NULL as t)

    AS T1

    order by a ASC

    RESULTS

    a t COL1 COL2 COL3

    ----------- ----------- -------------------- -------------------- --------------------

    1 1 2 2 1

    1 2 3 3 2

    1 5 4 4 3

    1 7 5 5 4

    1 NULL 1 99 5

    2 1 2 2 1

    2 2 3 3 2

    2 7 4 4 3

    2 NULL 1 99 4

    2147483647 is MAX INT

  • a

  • --late edit: i only saw page one of three: there are better solutions later in what i saw!

    i can get that result with a union:

    SELECT *,

    ROW_NUMBER()

    OVER (

    ORDER BY ID ) RowNum

    FROM #test

    WHERE colC IS NOT NULL

    UNION

    SELECT

    #test.*,NULL AS RowNum

    FROM #test

    WHERE colC IS NULL

    ORDER BY ID

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Need to do with a derived table. Here is the command to return what you want.

    SELECT t.id, t.colC, t1.RowNum

    FROM #test t

    left outer JOIN ( SELECT Id

    , ROW_NUMBER() OVER ( ORDER BY ID ) RowNum

    FROM #test

    WHERE COLC IS NOT NULL

    ) as t1

    on t.ID = t1.ID

  • Running sum would do it for you.......

    SELECT *,

    CASE WHEN colC IS NULL

    THEN NULL

    ELSE SUM( IIF(colC IS NULL, 0, 1) ) OVER (ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    END AS RowNum

    FROM #test;

    /Steen

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply