Does GROUP BY speed up query on data that is already at lowest granularity?

  • Hi,

    I have just been running a query which I was planning on improving by removing a redundant GROUP BY (there are about 20 columns, and one of the columns returned is atomic, so will mean that the "group by" will never manage to group any of the data) but when I modified the query to remove the grouping, this actually seems to slow the query, and I can't see why this would be the case.

    Both queries return the same number of rows (69000), as I expected, and looking at the query plan, then they look nearly identical, other than at the start, there is a "stream aggregate" and "sort" being performed. The estimated data size is 64MB for the non-grouped query (runs in 6 min 41 secs), vs 53MB for the aggregated query (runs in 5 min 31 secs), and the estimated row size is smaller when aggregated.

    Can someone rationalise this? In my mind, the data that is being pulled is identical, plus there is extra computation for doing an unnecessary aggregation, so the aggregated query should be unquestionably slower, but the database engine has other ideas; it seems to be able to work more quickly when it needs to do unnecessary work 🙂 Perhaps something to do with an inefficient query plan for the non-aggregated query? I would have thought looking at the actual execution plan might have made this apparent, but both plans look very similar.

    Edit: More information, the "group by" query had two aggregations on it, a count of one of the columns, and an average of another one. I changed this so that it was just "1" instead of the count, and for the average, I changed it to be the expression within the average aggregate, since the aggregation effectively does not do anything.

  • Very interesting!

    A Group By on a set of columns is technically creating a Primary Key out of those so access will be faster for later steps.

    I know in SS 6.5 I used to include PKs in most indexes because non-unique indexes had a 4x penalty to them when doing DML unless I added the PK column at the end to make them unique.

    I create alternate keys on most every table. Enforcing uniqueness allows us to make so many simplifying assumptions. Obviously the optimizer knows this and creates shortcuts too.

  • Would have to see the actual queries and query plans to really be able to analyze this.

    Also, clock time can be affected by many things. Be sure to review the actual IO stats as well.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply