Blog Post

Adding Row Numbers to a Query: #SQLNewBlogger

,

I realized that I hadn’t done much blogging on Window functions in T-SQL, and I’ve done a few presentations, so I decided to round out my blog a bit. This post will start with the ROW_NUMBER() function as a gentle intro to window functions.

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers. This is also part of a series on Window Functions.

A Basic Set of Data

I’m going to use some fun data for me. I’ve been tracking my travels, since I’m on the road a lot. I’m a data person and part of tracking is trying to ensure I’m not doing too much. Just looking at the data helps me keep perspective and sometimes cancel (or decline) a trip.

In any case, you don’t care, but I essentially have this data in a table. As you can see, I have the date of travel, the city, area, etc. I also have a few flags as to whether I was traveling that day, if I spent a night away from home, and how far I was.

2024-09_0199

I have a travelID in here, which is a sequence, but what if I wanted to the trips I took in August 2024. I’d want a distance > 0 (not at home) and filters by dates. Adding that to my query, I’d run this:

SELECT
   TravelID
, TravelDate
, TravelCity
, Area
, Province
FROM travel
WHERE
   TravelDate     > '2024/07/31'
   AND TravelDate < '2024/09/01'
   AND Distance > 0
ORDER BY TravelDate;

This gives me these results:

2024-09_0202

There are 9 rows in here, but they have a weird ID number, plus these are different trips. I can just add a row_number to this data, and I’d see this result. Ignore the OVER and track I used, but you can see an incrementing number added to each row. The second column in the result set matches with the number added by SSMS on the side.

2024-09_0204

What if I wanted to see the separate trips with some row number for the day in each city?

That’s where a row_number() can help.

Creating a Window

The window comes from the OVER() clause, which is added to a number of functions, including Row_number(). The OVER() clause lets me set a window or rows on which the function works. I can set a partition and an order.

The partition is a column where we are essentially grouping data. For me, this would be the city. When I change city, I want to reset the number. Looking at the data above, I’d expect to see 1, 2 for the first 2 days in Minneapolis, then a 1 for a day in Fort Collins, and another 1 for day in Aurora

The ordering is what order is the data in the partition. In this case, I want to have the data in the window ordered by traveldate, so I’ll use that. I now have this code:

SELECT
   TravelID

, ROW_NUMBER () OVER (PARTITION BY TravelCity
                       ORDER BY TravelDate)

, TravelDate

, TravelCity

, Area

, Province

FROM travel

WHERE
   TravelDate     > ‘2024/07/31’
   AND TravelDate < ‘2024/09/01’
   AND Distance > 0

ORDER BY TravelDate;

And I get these results, where I can see that I essentially had 4 trips (all with number 1s), and these were the trips:

  • Minn – 2 days
  • Fort Collins – 1 day
  • Aurora – 1 day
  • New York City – 5 days

2024-09_0205

This shows how row_number() gives me a sequence based on the partition. The select null part in the earlier query just ignores the order by, which is required for row_number(). With no order, how do we know what sequence?

Let’s change this slightly. What if I partition by Province? Then I see this:

2024-09_0206

We put the data in date order, and ran through each province. In this case, my two 1 day trips around Colorado are bucketed (partitioned) together and I see one less trip. If I did this by country, I’d see all of this as one sequential list, since all my trips were in one country.

However, if I did countries for June, I’d see this, with the raw data on the left and the row_number() on the right. There’s a weird sequence in here; can you see it?

2024-09_0208

The weirdness is that my trips to England were broken up by a trip to Italy. So while my sequence looks good for the first part of the trip to English for 6 days, when I returned a week later, we get numbers 7, 8, 9. That’s because the data is grouped first by country, and the sequence added. The ordering of the sequence is by date, so the later days (June 13,-15) are marked with the higher sequence that continues on.

Hopefully this gives you a basic look at row_number() and some of the possibilities. I’ll examine it further in another post, along with various other window functions.

SQL New Blogger

Complex coding and finding weird situations are things employers want you to be able to do. If you work on algorithms or you’re learning new language elements, blog about them. That will impress people.

This post took me about 30 minutes, plus about 15 minutes or playing with code to set things up. You could likely do this in an hour if you’ve never blogged, though let someone proof things for you.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating