SQL - Where Row_Number = 1 discussion

  • Afternoon ladies and gents,

    I'm trying to find an alternative, more efficient way of processing some code to select the top row of a specific order.

    At the moment we use row_number with a partition. Then outside the statement in the where clause state where [rownum] =1.

    Kinda like this:

    Select *

    From

    (

    Select ColA,ColB,ColC,

    row_number() OVER ( PARTITION BY [ColA] ,[ColB] ORDER BY [ColA] ASC, Col[C]DESC) as rownum

    )A

    Where rownum =1

    This seems like a really horrible way to do things, especially with relatively large datasets of 500k+ so I'm looking into a new solution.

    Although this seems the simplest, is it the most efficient?

  • Hi, welcome to the forum.

    You've not given us a massive amount to go on there. If you can post some sample data it will give us a much better idea of what you mean.

    At a rough guess, have you considered SELECT TOP 1 * FROM YOURTABLE ORDER BY [your order criteria]


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Without anything else to go on, I would pick the TOP 1 as well. If you post the DDL for the table and include any indexes you have already, we can probably give better advice.

  • Have you tried a bog-standard aggregate?

    SELECT ColA, ColB, ColC = MAX(ColC)

    FROM …

    GROUP BY ColA, ColB

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Unfortunately I cannot post the table or data due to confidentiality reasons, however an extreme simplification of the problem:

    CREATE TABLE [dbo].[Team]

    (

    [FirstName] [varchar](20) NULL,

    [LastName] [varchar] (20) NULL,

    [Team] Varchar (20) NULL,

    [TransactionID] int,

    Price float,

    NoofTickets int

    ) ON [PRIMARY]

    INSERT INTO [dbo].[Team]

    ([FirstName]

    ,[LastName]

    ,[Team]

    ,[TransactionID]

    ,[Price]

    , NoofTickets)

    Values

    ('John', 'Smith', 'Team A', 65, 69.99,2),

    ('John', 'Smith', 'Team A', 65, NULL,NULL),

    ('Davie', 'Jones', 'Team A', 89, £349.99,5),

    ('Davie', 'Jones', 'Team A', 89, NULL,NULL),

    ('Davie', 'Jones', 'Team A', 89, NULL,NULL),

    ('Davie', 'Jones', 'Team A', 89, NULL,NULL),

    ('Davie', 'Jones', 'Team A', 89, NULL,NULL),

    ('Jack','Wilson','Team B',55,NULL,NULL),

    ('Jack','Wilson','Team B',55,NULL,NULL),

    ('Jack','Wilson','Team B',55,£299.99,5),

    ('Jack','Wilson','Team B',55,NULL,NULL),

    ('Jack','Wilson','Team B',55,NULL,NULL)

    And the query:

    Select *

    FROM

    (

    SELECT [FirstName]

    ,[LastName]

    ,[Team]

    ,[TransactionID]

    ,[Price],

    Row_Number() over (partition by firstname, Transactionid order by transactionid,price desc) rownum

    FROM [Test].[dbo].[Team]

    )A

    Where rownum = 1

    Unfortunately aggregations wont work as the maximum value wont always be the value returned (I know in the really bad example that this is the case and I can put a where is not null, but its the rownumber part to focus on).

    So in a nutshell, row_number() is being used to de-duplicate the data.

    The solution we have works fine, me being me, i'm just want to know if there's a better way of doing things

  • I'm not sure if it's a better solution but it's an alternative that you could test.

    SELECTDISTINCT

    [FirstName]

    ,[LastName]

    ,[Team]

    ,[TransactionID]

    ,FIRST_VALUE([Price]) over (partition by firstname, Transactionid order by price desc) Price

    FROM [Test].[dbo].[Team]

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • DROP TABLE #Team

    CREATE TABLE #Team

    (

    [FirstName] [varchar](20) NULL,

    [LastName] [varchar] (20) NULL,

    [Team] Varchar (20) NULL,

    [TransactionID] int,

    Price float,

    NoofTickets int

    ) ON [PRIMARY]

    INSERT INTO #Team

    ([FirstName]

    ,[LastName]

    ,[Team]

    ,[TransactionID]

    ,[Price]

    , NoofTickets)

    VALUES

    ('John', 'Smith', 'Team A', 65, 69.99,2),

    ('John', 'Smith', 'Team A', 65, NULL,NULL),

    ('Davie', 'Jones', 'Team A', 89, £349.99,5),

    ('Davie', 'Jones', 'Team A', 89, NULL,NULL),

    ('Davie', 'Jones', 'Team A', 89, NULL,NULL),

    ('Davie', 'Jones', 'Team A', 89, NULL,NULL),

    ('Davie', 'Jones', 'Team A', 89, NULL,NULL),

    ('Jack','Wilson','Team B',55,NULL,NULL),

    ('Jack','Wilson','Team B',55,NULL,NULL),

    ('Jack','Wilson','Team B',55,£299.99,5),

    ('Jack','Wilson','Team B',55,NULL,NULL),

    ('Jack','Wilson','Team B',55,NULL,NULL)

    -- This type of query (Current method) is critically dependant upon correct indexing:

    CREATE INDEX ix_helper on #Team (firstname, Transactionid, price DESC) INCLUDE ([LastName],[Team])

    ----------------------------------------------------------------------------------------------------------

    --------------------------------------------------------------

    -- Current method

    SELECT

    [FirstName]

    ,[LastName]

    ,[Team]

    ,[TransactionID]

    ,[Price]

    FROM

    (

    SELECT [FirstName]

    ,[LastName]

    ,[Team]

    ,[TransactionID]

    ,[Price],

    Row_Number() over (partition by firstname, Transactionid order by price desc) rownum

    FROM #Team

    )A

    WHERE rownum = 1

    ORDER BY firstname

    -- One alternative method

    SELECT

    d.firstname,

    x.LastName,

    x.Team,

    d.Transactionid,

    d.price

    FROM (

    SELECT firstname, Transactionid, price = MAX(price)

    FROM #Team

    GROUP BY firstname, Transactionid

    ) d

    CROSS APPLY (

    SELECT TOP 1 [LastName],[Team]

    FROM #Team t

    WHERE t.FirstName = d.FirstName

    AND t.TransactionID = d.TransactionID

    ) x

    ORDER BY d.firstname

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • w.zia.2015 (4/30/2015)


    Afternoon ladies and gents,

    I'm trying to find an alternative, more efficient way of processing some code to select the top row of a specific order.

    At the moment we use row_number with a partition. Then outside the statement in the where clause state where [rownum] =1.

    Kinda like this:

    Select *

    From

    (

    Select ColA,ColB,ColC,

    row_number() OVER ( PARTITION BY [ColA] ,[ColB] ORDER BY [ColA] ASC, Col[C]DESC) as rownum

    )A

    Where rownum =1

    This seems like a really horrible way to do things, especially with relatively large datasets of 500k+ so I'm looking into a new solution.

    Although this seems the simplest, is it the most efficient?

    I don't see anything wrong with this provided that you have the right index in place. Take a look at this article:

    SQL Server 2012: How to Write T-SQL Window Functions, Part 3. In there Ben-Gan discusses the POC Index. If you have the right POC index in place your query should be just fine. Even, if you don't use Windows functions to solve your problem a POC index will still come in handy.

    The index that Chris creates in his solution is a great example of a POC index. It improves the performance in both example queries that he provided by eliminating a very costly sort operator.

    Ben-Gan talks about this concept and provides many more examples in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (4/30/2015)


    Ben-Gan's talks about this concept and provides many more examples in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions

    If you don't have this book, I'd highly recommend it.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply