SQL - Where Row_Number = 1 discussion

Question

SQL - Where Row_Number = 1 discussion

w.zia.2015

SSC Veteran

Points: 248
More actions
April 30, 2015 at 5:43 am

#317192

Afternoon ladies and gents,
I'm trying to find an alternative, more efficient way of processing some code to select the top row of a specific order.
At the moment we use row_number with a partition. Then outside the statement in the where clause state where [rownum] =1.
Kinda like this:
Select *
From
(
Select ColA,ColB,ColC,
row_number() OVER ( PARTITION BY [ColA] ,[ColB] ORDER BY [ColA] ASC, Col[C]DESC) as rownum
)A
Where rownum =1
This seems like a really horrible way to do things, especially with relatively large datasets of 500k+ so I'm looking into a new solution.
Although this seems the simplest, is it the most efficient?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Neil Burton SSC-Insane Points: 22598 More actions · Answer 1

Hi, welcome to the forum.

You've not given us a massive amount to go on there. If you can post some sample data it will give us a much better idea of what you mean.

At a rough guess, have you considered SELECT TOP 1 * FROM YOURTABLE ORDER BY [your order criteria]

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher

How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Ed Wagner SSC Guru Points: 287015 More actions · Answer 2

Without anything else to go on, I would pick the TOP 1 as well. If you post the DDL for the table and include any indexes you have already, we can probably give better advice.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Have you tried a bog-standard aggregate?

SELECT ColA, ColB, ColC = MAX(ColC)

FROM …

GROUP BY ColA, ColB

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

waxb18 Ten Centuries Points: 1060 More actions · Answer 4

Unfortunately I cannot post the table or data due to confidentiality reasons, however an extreme simplification of the problem:

CREATE TABLE [dbo].[Team]

(

[FirstName] [varchar](20) NULL,

[LastName] [varchar] (20) NULL,

[Team] Varchar (20) NULL,

[TransactionID] int,

Price float,

NoofTickets int

) ON [PRIMARY]

INSERT INTO [dbo].[Team]

([FirstName]

,[LastName]

,[Team]

,[TransactionID]

,[Price]

, NoofTickets)

Values

('John', 'Smith', 'Team A', 65, 69.99,2),

('John', 'Smith', 'Team A', 65, NULL,NULL),

('Davie', 'Jones', 'Team A', 89, £349.99,5),

('Davie', 'Jones', 'Team A', 89, NULL,NULL),

('Jack','Wilson','Team B',55,NULL,NULL),

('Jack','Wilson','Team B',55,£299.99,5),

('Jack','Wilson','Team B',55,NULL,NULL),

('Jack','Wilson','Team B',55,NULL,NULL)

And the query:

Select *

FROM

(

SELECT [FirstName]

,[LastName]

,[Team]

,[TransactionID]

,[Price],

Row_Number() over (partition by firstname, Transactionid order by transactionid,price desc) rownum

FROM [Test].[dbo].[Team]

)A

Where rownum = 1

Unfortunately aggregations wont work as the maximum value wont always be the value returned (I know in the really bad example that this is the case and I can put a where is not null, but its the rownumber part to focus on).

So in a nutshell, row_number() is being used to de-duplicate the data.

The solution we have works fine, me being me, i'm just want to know if there's a better way of doing things

Luis Cazares SSC Guru Points: 183686 More actions · Answer 5

I'm not sure if it's a better solution but it's an alternative that you could test.

SELECTDISTINCT

[FirstName]

,[LastName]

,[Team]

,[TransactionID]

,FIRST_VALUE([Price]) over (partition by firstname, Transactionid order by price desc) Price

FROM [Test].[dbo].[Team]

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

DROP TABLE #Team

CREATE TABLE #Team

(

[FirstName] [varchar](20) NULL,

[LastName] [varchar] (20) NULL,

[Team] Varchar (20) NULL,

[TransactionID] int,

Price float,

NoofTickets int

) ON [PRIMARY]

INSERT INTO #Team

([FirstName]

,[LastName]

,[Team]

,[TransactionID]

,[Price]

, NoofTickets)

VALUES

('John', 'Smith', 'Team A', 65, 69.99,2),

('John', 'Smith', 'Team A', 65, NULL,NULL),

('Davie', 'Jones', 'Team A', 89, £349.99,5),

('Davie', 'Jones', 'Team A', 89, NULL,NULL),

('Jack','Wilson','Team B',55,NULL,NULL),

('Jack','Wilson','Team B',55,£299.99,5),

('Jack','Wilson','Team B',55,NULL,NULL),

('Jack','Wilson','Team B',55,NULL,NULL)

-- This type of query (Current method) is critically dependant upon correct indexing:

CREATE INDEX ix_helper on #Team (firstname, Transactionid, price DESC) INCLUDE ([LastName],[Team])

----------------------------------------------------------------------------------------------------------

--------------------------------------------------------------

-- Current method

SELECT

[FirstName]

,[LastName]

,[Team]

,[TransactionID]

,[Price]

FROM

(

SELECT [FirstName]

,[LastName]

,[Team]

,[TransactionID]

,[Price],

Row_Number() over (partition by firstname, Transactionid order by price desc) rownum

FROM #Team

)A

WHERE rownum = 1

ORDER BY firstname

-- One alternative method

SELECT

d.firstname,

x.LastName,

x.Team,

d.Transactionid,

d.price

FROM (

SELECT firstname, Transactionid, price = MAX(price)

FROM #Team

GROUP BY firstname, Transactionid

) d

CROSS APPLY (

SELECT TOP 1 [LastName],[Team]

FROM #Team t

WHERE t.FirstName = d.FirstName

AND t.TransactionID = d.TransactionID

) x

ORDER BY d.firstname

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Alan Burstein SSC Guru Points: 61136 More actions · Answer 7

w.zia.2015 (4/30/2015)
Afternoon ladies and gents,
I'm trying to find an alternative, more efficient way of processing some code to select the top row of a specific order.
At the moment we use row_number with a partition. Then outside the statement in the where clause state where [rownum] =1.
Kinda like this:
Select *
From
(
Select ColA,ColB,ColC,
row_number() OVER ( PARTITION BY [ColA] ,[ColB] ORDER BY [ColA] ASC, Col[C]DESC) as rownum
)A
Where rownum =1
This seems like a really horrible way to do things, especially with relatively large datasets of 500k+ so I'm looking into a new solution.
Although this seems the simplest, is it the most efficient?

I don't see anything wrong with this provided that you have the right index in place. Take a look at this article:

SQL Server 2012: How to Write T-SQL Window Functions, Part 3. In there Ben-Gan discusses the POC Index. If you have the right POC index in place your query should be just fine. Even, if you don't use Windows functions to solve your problem a POC index will still come in handy.

The index that Chris creates in his solution is a great example of a POC index. It improves the performance in both example queries that he provided by eliminating a very costly sort operator.

Ben-Gan talks about this concept and provides many more examples in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Ed Wagner SSC Guru Points: 287015 More actions · Answer 8

Alan.B (4/30/2015)
Ben-Gan's talks about this concept and provides many more examples in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions

If you don't have this book, I'd highly recommend it.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions