derived table performance tune in Group By

  • I was asked in an interview that. If you have "EmployeeName", HoursWorked in "Customer" table.

    Select

    EmployeeName,

    count(HoursWorked)

    from Customer

    group by EmployeeName

    He asked me if you do group by EmployeeName.The query will take a long time. How can you allow EmployeeName in Select without using it in group By . (Basically what appears in select has to appear in group by if u r using Aggregate function in Select statement).I could'nt answer .he told me by using Derived table.I cannot understand.how can you use Derived table in such case.Where you can avoid using EmployeeName in Group By clause.But still be able to use in Select statement. These are the only columns in the Customer table and no other.

    thank you

  • His answer is actually wrong. If you have an "HoursWorked" attribute, you have to assume that this is a roll-up of all hours for the employee since he/she has been working with the company; you don't even need a GROUP BY, given his definition. There is only one employee record in the table for each employee. And, you don't need a derived table for this either. If he is using an aggregate function, you will need a GROUP BY. Whether performance is better or not is inconclusive. You would have to look at it in profiler or using the various TSQL SET commands to actually see if it is faster with less logical/physical reads.

    But, your description (count(HoursWorked) doesn't make sense either. So you're saying that for each hour worked a row is generated? That wouldn't be correct - you'd generally use SUM to add the hours say for each week, not COUNT.

    I'd tell the "manager" to put down the pipe 😀

  • In fact,my other friend also attended the same interview. For him instead of Hoursworked. He gave him SUM(Amount) as agggregate.

    So in Such case aslo How can you avoid EmployeeName in Group by .But it should appear in Select .Using Derived table.

    ????

    thanks

  • Sounds to me that he's simply pushing off the aggregation to a derived table. But he'd still have to do the SUM there right?

  • Sorry... I believe that one of two things has happened here... either that the two of you guys on the interview left something out of the code you posted, or the interviewer needs to be fed some pork chops the hard way because of the BS question being asked. For the code given, the addition of a derived table would provide absolutely no performance increase whatsoever.

    What I think probably happend was that a slightly more complex query was used with joins between at least two tables and, yes, that could be slow. The code posted on the thread would be the actual answer as to what would need to be in the derived table... well except for the BS count of hours worked which should have been a sum.

    By the way, doing aggregations in a derived table and then joining on that is NOT the best way to improve the performance of such a thing. I just went through all this at work with some folks that had queries taking more than 30 minutes to run. Someone converted the code to do the aggregations in a derived tables, as suggested. Yep... everyone was tickled pink with the 3 minute duration that action brought the run time down to. After that, they stopped looking for any other improvements.

    But, I've been through this before... imagine their joy when I got the query to run in under 19 seconds by using what they used in the derived table to populate a temp table using SELECT/INTO and then joining on that.

    Peter Larson very correctly calls the method "pre-aggregation" when it's done either by a derived table or a temp table. It's one of the most effective methods of achieving blinding performance using the good ol' "Divide'n'Conquer" methods that I know of. Heh... try THAT in a view. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply