Blog Post

T-SQL Window Functions – Part 3: Aggregate Functions

,

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL WIndow Functions_thumb[1]_thumbThis is part 3 in my series on SQL window functions. In this post, we will explore using aggregation functions with T-SQL windows. SQL Server supports most of the aggregation functions such as SUM and AVG in this context with the exceptions of GROUPING and GROUPING_ID. However, prior to SQL Server 2012 only the PARTITION BY clause was supported which greatly limited the usability of aggregate window functions. When support for the ORDER BY clause was introduced in SQL Server 2012, more complex business problems such as running totals could be solved without the extensive use of cursors or nested select statement. In my experience, I used to try various ways to get around this limitation including pushing the data to .NET as it could solve this problem more efficiently. However, this was not always possible when working with reporting. Now that we are able to use SQL to solve the problem, more complex and low-performing solutions can be replaced with these window functions.

Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as

(select cast(1 as int) as OrderID, cast(‘3/1/2012′ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName

union select 2, ‘3/1/2012′, 11.00, ‘Sam’

union select 3, ‘3/2/2012′, 10.00, ‘Beth’

union select 4, ‘3/2/2012′, 15.00, ‘Joe’

union select 5, ‘3/2/2012′, 17.00, ‘Sam’

union select 6, ‘3/3/2012′, 12.00, ‘Joe’

union select 7, ‘3/4/2012′, 10.00, ‘Beth’

union select 8, ‘3/4/2012′, 18.00, ‘Sam’

union select 9, ‘3/4/2012′, 12.00, ‘Joe’

union select 10, ‘3/4/2012′, 11.00, ‘Beth’

union select 11, ‘3/5/2012′, 14.00, ‘Sam’

union select 12, ‘3/6/2012′, 17.00, ‘Beth’

union select 13, ‘3/6/2012′, 19.00, ‘Joe’

union select 14, ‘3/7/2012′, 13.00, ‘Beth’

union select 15, ‘3/7/2012′, 16.00, ‘Sam’

)

select OrderID

,OrderDate

,OrderAmt

,CustomerName

from CTEOrders;

Using PARTITION BY with Aggregate Functions

SQL Server 2005 and the newer versions supports the usage of the PARTITION BY clause by itself. This allowed for some simple aggregate windows. The following example shows SUM and AVG for different partitions of data. The third function actually creates and average using a SUM and COUNT function.

select CustomerName

,OrderDate

,OrderAmt

,SUM(OrderAmt) OVER (PARTITION BY CustomerName) CustomerTotal

,AVG(OrderAmt) OVER (PARTITION BY OrderDate) AvgDailyAmt

,CAST(COUNT(OrderID) OVER (PARTITION BY OrderDate) as decimal(8,3)) / CAST(COUNT(OrderID) OVER() as decimal(8,3)) PctOfTotalPerDay

from CTEOrders

order by OrderDate;

NOTE: The COUNT aggregate returns an integer value. In order to return the decimal, the values need to be explicitly converted to decimal types. Otherwise, the result was rounding to zero for all results in this sample.

Results

CustomerNameOrderDateOrderAmtCustomerTotalAvgDailyAmtPctOfTotalPerDay
Joe3/1/2012106810.50.133333333
Sam3/1/2012117610.50.133333333
Sam3/2/20121776140.2
Joe3/2/20121568140.2
Beth3/2/20121061140.2
Joe3/3/20121268120.066666667
Joe3/4/2012126812.750.266666667
Beth3/4/2012106112.750.266666667
Beth3/4/2012116112.750.266666667
Sam3/4/2012187612.750.266666667
Sam3/5/20121476140.066666667
Beth3/6/20121761180.133333333
Joe3/6/20121968180.133333333
Beth3/7/2012136114.50.133333333
Sam3/7/2012167614.50.133333333

Using Subselects

Subselect statements in SQL Server are supported, but harder to optimize in SQL Server versus Oracle. Until window functions were introduced all of the situations above could be solved by subselects, but performance would degrade as the results needed to work with larger sets of data. With the improved functionality in SQL Server 2012, you should not need to use subselects to return row-based aggregations. Besides the performance implications, maintenance will also be much simpler as the SQL becomes more transparent. For reference, here is the subselect syntax to return the same results as above:

select cte.CustomerName

, cte.OrderDate

, cte.OrderAmt

, (select SUM(OrderAmt) from CTEOrders where CustomerName = cte.CustomerName) CustomerTotal

, (select cast(COUNT(OrderID) as decimal(8,3)) from CTEOrders where OrderDate = cte.OrderDate) / (select cast(COUNT(OrderID) as decimal(8,3)) from CETOrders) AvgDailyAmt

from CETOrders cte

order by cte.OrderDate;

While it is possible to solve the same function using the subselects, the code is already getting messier and with data sets larger than what we have here, you would definitely see performance degradation.

Some Thoughts on GROUP BY

While I am digressing, I wanted to also highlight some details concerning GROUP BY. The one the biggest difficulties working with the GROUP BY clause and aggregates, every column must either be a part of the GROUP BY or have an aggregation associated with it. The window functions help solve this problem as well.

In the following examples, the first query returns the sum of the amount by day. This is pretty standard logic when working with aggregated queries in SQL.

select OrderDate

,sum(OrderAmt) as DailyOrderAmt

from CTEOrders

group by OrderDate;

However, if you wanted to see more details, but not include them in the aggregation, the following will not work.

select OrderDate

,OrderID

,OrderAmt

,sum(OrderAmt) as DailyOrderAmt

from CTEOrders

group by OrderDate

,OrderID

,OrderAmt;

This SQL statement will return each row individually with the sum at the detail level. You can solve this using the subselect above which is not recommended or you can use a window function.

select OrderDate

,OrderID

,OrderAmt

,sum(OrderAmt) OVER (PARTITION BY OrderDate) as DailyOrderAmt

from CTEOrders

As you can see here and in previous examples the OVER clause allows you to manage the grouping based on the context specified in relationship to the current row.

One other twist on the GROUP BY clause. First, I need to give credit to Itzik Ben-Gan for calling this to my attention at one of our Minnesota SQL Server User Group meetings. In his usual fashion he was showing some T-SQL coolness and he showed an interesting error when using the OVER clause with the GROUP BY clause.

The following will return an error because the first expression is an aggregate, but the second expression which is using the OVER clause is not. Also note that in this example the OVER clause is being evaluated for the entire set of data.

select sum(OrderAmt)

, sum(OrderAmt) over() as TotalOrderAmt

from CTEOrders

group by CustomerName

The expression above returns the following error:
Column ‘CTEOrders.OrderAmt’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

The goal of the statement above was to show the customer’s total order amount with the overall order amount. The following statement resolves this issue because it is aggregating the aggregates. The window is now summing the aggregated amount which are grouped on the customer name.

select sum(OrderAmt)

, sum(sum(OrderAmt)) over() as TotalOrderAmt

from CTEOrders

group by CustomerName

Thanks again to Itzik for bringing this problem and resolution to my attention.

Aggregates with ORDER BY

With the expansion of the OVER clause to include ORDER BY support with aggregates, window functions increased their value substantially. One of the key business problems this allowed us to solve was a running aggregate.

The first example shows how to get a running total by CustomreName based on OrderDate and OrderID.

select OrderID

,OrderDate

,OrderAmt

,CustomerName

,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID) as RunningByCustomer

from CTEOrders

ORDER BY CustomerName, OrderDate;

Results

OrderIDOrderDateOrderAmtCustomerNameRunningByCustomer
33/2/201210Beth10
73/4/201210Beth20
103/4/201211Beth31
123/6/201217Beth48
143/7/201213Beth61
13/1/201210Joe10
43/2/201215Joe25
63/3/201212Joe37
93/4/201212Joe49
133/6/201219Joe68
23/1/201211Sam11
53/2/201217Sam28
83/4/201218Sam46
113/5/201214Sam60
153/7/201216Sam76

This next example is more creative. It begins to show how powerful the window functions are. In this statement, we are going to return the annual running total aggregated by day. The differentiator here is that we use a DATEPART function in the OVER clause to achieve the desired results.

select OrderID

,OrderDate

,OrderAmt

,CustomerName

,SUM(OrderAmt) OVER (PARTITION BY datepart(yyyy, OrderDate) ORDER BY OrderDate) as AnnualRunning

from CTEOrders

ORDER BY OrderDate;

Results

OrderIDOrderDateOrderAmtCustomerNameAnnualRunning
13/1/201210Joe21
23/1/201211Sam21
33/2/201210Beth63
43/2/201215Joe63
53/2/201217Sam63
63/3/201212Joe75
73/4/201210Beth126
83/4/201218Sam126
93/4/201212Joe126
103/4/201211Beth126
113/5/201214Sam140
123/6/201217Beth176
133/6/201219Joe176
143/7/201213Beth205
153/7/201216Sam205

The ORDER BY clause creates an expanding group within the partition. In the examples above, the partition was the customer. Within each partition, ordered groups based on OrderDate and OrderID are “created”. At each row, the OrderDate and OrderID groups are aggregated up to the current row’s group thus producing the running total. If more than one row has the same order grouping, all of the rows in the group are aggregated into the total as shown in the second example above with the days and years.

Aggregates with ROWS

The ROWS clause is used to further define the partition by specifying which physical rows to include based on their proximity to the current row. As noted in the first post in the series, ROWS requires the ORDER BY clause as this determines the orientation of the partition.

The following example uses the FOLLOWING keywords to find the next two purchases that the customer made.

select OrderID

,OrderDate

,OrderAmt

,CustomerName

,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID ROWS BETWEEN 1 FOLLOWING and 2 FOLLOWING) as NextTwoAmts

from CTEOrders

order by CustomerName, OrderDate, OrderID;

Results

OrderIDOrderDateOrderAmtCustomerNameNextTwoAmts
33/2/201210Beth21
73/4/201210Beth28
103/4/201211Beth30
123/6/201217Beth13
143/7/201213BethNULL
13/1/201210Joe27
43/2/201215Joe24
63/3/201212Joe31
93/4/201212Joe19
133/6/201219JoeNULL
23/1/201211Sam35
53/2/201217Sam32
83/4/201218Sam30
113/5/201214Sam16
153/7/201216SamNULL

As we noted in the first blog, the last two rows in the partition only contain partial values. For example, order 12 contains the sum of only one order, 14, and order 14 has now rows following it in the partition and returns NULL as a result. When working with the ROWS clause this must be taken into account.

Aggregates with RANGE

Lastly, adding the RANGE to the OVER clause allows you to create aggregates which go to the beginning or end of the partition. RANGE is commonly used with UNBOUNDED FOLLOWING which goes to the end of the partition and UNBOUNDED PRECEDING which goes to the beginning of the partition. One of the most common use would be to specify the rows from the beginning of the partition to the current row which allows for aggregations such as year to date.

In the example below, we are calculating the average order size over time to the current row. This could be a very effective in a trending report.

select OrderID

,OrderDate

,OrderAmt

,CustomerName

,AVG(OrderAmt) OVER (ORDER BY OrderID RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as AvgOrderAmt

from CTEOrders

order by OrderDate;

Results

OrderIDOrderDateOrderAmtCustomerNameAvgOrderAmt
13/1/201210Joe10
23/1/201211Sam10.5
33/2/201210Beth10.333333
43/2/201215Joe11.5
53/2/201217Sam12.6
63/3/201212Joe12.5
73/4/201210Beth12.142857
83/4/201218Sam12.875
93/4/201212Joe12.777777
103/4/201211Beth12.6
113/5/201214Sam12.727272
123/6/201217Beth13.083333
133/6/201219Joe13.538461
143/7/201213Beth13.5
153/7/201216Sam13.666666

As you can see, the latest versions of OVER clause supports powerful yet simple aggregations which can help in a multitude of reporting and business solutions. Up next, the last blog in the series – Analytic Functions which are all new in SQL Server 2012.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating