Emulate FIRST aggregate function with T-SQL?

  • As a relative T-SQL newbie I'm always comparing what I know in MSAccess to what I need to do in T-SQL. I'm currently trying to write a query to do the following. Given this table:

    Name TestNo Grade

    joe 1 A

    joe 2 B

    joe 3 C

    bob 2 B

    bob 3 C

    sue 1 B

    sue 2 B

    sue 3 C

    Return the best row for each student:

    Name TestNo Grade

    joe 1 A

    bob 2 B

    sue 1 B

    In access I'd do:

    SELECT g.Name, First(g.TestNo) AS TestNo, Min(g.Grade) AS Grade

    FROM Grades g

    GROUP BY g.Name;

    However there's no FIRST aggregate function in T-SQL that I'm aware of. How shall I construct a query to retrieve the same results?

    Greg.

  • As soon as I hit 'post' I realized that this example is too simple. I can simply substitute 'MIN' where 'FIRST' is used.

    I also figured out that this is not a terribly efficient query against a large data set. Any suggestions on other ways to get the same results that might speed things up? (Hash Match/Aggregate is the biggest bottleneck at 46% according to the query plan)

  • See my FAQ on "What (good) are self joins anyway?" to answer this q.

    select *

    from grades g

    and g.Grade = (select min(grade)

    from grades inside

    where g.name = inside.name

    )

    Index the Grade and Name columns.

  • Thanks for the tip don1941. I'll try it out tomorrow at work.

  • The problem with using MIN is that you could end up with the wrong test for the grade.

    Name TestNo Grade

    joe 3 A

    joe 2 B

    joe 1 C

    With this data you would end up with joe 1 A.

    Try

    SELECT Grades.name, AggGrades.min_grade, MIN(testNo)

    FROM Grades

    JOIN (SELECT name, min(grade) min_grade

    FROM grades

    GROUP BY name) AggGrades ON AggGrades.Name = Grades.Name

    AND AggGrades.min_grade = Grades.grade

    GROUP BY Grades.name, AggGrades.min_grade

    The derived table returns the min grade for a person and then the main query finds the minimum testNo for each person and their min grade.

    Simon Sabin

    Co-author of SQL Server 2000 XML Distilled

    http://www.amazon.co.uk/exec/obidos/ASIN/1904347088


    Simon Sabin
    SQL Server MVP

    http://sqlblogcasts.com/blogs/simons

  • Thanks to both of you! My actual scenario is more complex, with joined tables even on the inside loop. What I gather from both of your examples is that I'll have to make a nested self-join for each important field until I get one row per student.

  • Just to add to this I use to use subqueries and joins to aggregate values for one table to update another. But found performance dropped like a lead weight when volume increased. I change the process to summarise into temp table first and then did update with a join. Found huge improvement. Just an observation. I know there is probably a debate raging regarding subqueries vs temp tables.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Try this

    SELECT TG.* FROM

    (SELECT [Name], Min(TestNo) As TestNo FROM TestGrade Group By [Name]) As FT

    INNER JOIN

    TestGrade TG

    ON

    FT.[Name] = TG.[Name] AND

    FT.TestNo = TG.TestNo

    Order By TG.Grade, TG.[Name]

  • Ok, to make things more complicated, the following code creates the 'grades' table. I've modified the original data and added a comment column. What I'd like to get back is:

    name, first test that got highest grade, highest grade, and comment for each student.

    Q1: The query I used works, but is this an efficient way?

    Q2: As I need this result set as the basis of MANY other queries, does it make sense to dump the results into a table? The data is imported into SQL server once weekly, I could simply add the code to re-generate the data at the same time.

    A bit more info: I'll probably try joining to temp tables too as David Burrows suggests. The real source data comes from two tables each with around 100k records, query time is about 15 sec on my development server.

    
    
    USE PUBS
    GO
    SET NOCOUNT ON
    IF EXISTS (SELECT * from sysobjects WHERE id = object_id(N'Grades')
    AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
    Drop Table Grades
    /*This builds the table:*/
    Create table Grades (
    [Name] varchar(10),
    TestNo int,
    Grade varchar(1),
    Comment VarChar(50)
    )
    GO
    /*Insert data:*/
    INSERT into Grades ([Name], TestNo, Grade, Comment)
    Values('joe', '1', 'C', 'try harder')
    INSERT into Grades ([Name], TestNo, Grade, Comment)
    Values('joe', '2', 'B', 'Ok')
    INSERT into Grades ([Name], TestNo, Grade, Comment)
    Values('joe', '3', 'A', 'Good')
    INSERT into Grades ([Name], TestNo, Grade, Comment)
    Values('bob', '2', 'B', 'Not bad')
    INSERT into Grades ([Name], TestNo, Grade, Comment)
    Values('bob', '3', 'C', 'need work')
    INSERT into Grades ([Name], TestNo, Grade, Comment)
    Values('sue', '1', 'B', 'Ok')
    INSERT into Grades ([Name], TestNo, Grade, Comment)
    Values('sue', '2', 'B', 'Almost an A')
    INSERT into Grades ([Name], TestNo, Grade, Comment)
    Values('sue', '3', 'C', 'What happened')
    /*The Query:*/
    /*Get First record with Best grade for each student.*/
    SELECT * From Grades
    INNER JOIN
    --get first test with best grade
    (
    SELECT bg.[name], bg.grade, min(g2.testno) AS TestNo
    FROM Grades g2
    INNER JOIN
    --get Best Grade for each student
    (
    SELECT g.[name], MIN(g.grade) AS Grade
    FROM Grades g
    GROUP BY g.[name]
    ) bg
    ON g2.[name] = bg.[name] AND g2.grade = bg.grade
    GROUP BY bg.[name], bg.grade
    )ft
    ON Grades.[name] = ft.[name] AND Grades.[TestNo] = ft.TestNo AND Grades.grade = ft.grade
    /*cleanup*/
    IF EXISTS (SELECT * from sysobjects WHERE id = object_id(N'Grades')
    AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
    Drop Table Grades

    Thanks Again.

    Greg.

    Edited by - GregLyon on 12/13/2002 10:26:51 AM

  • Self joins and subselects can be avoided by using the following query:

    
    
    SELECT Name, RIGHT(MIN(Grade + CAST(TestNo AS VARCHAR)),1), MIN(Grade), SUBSTRING(MIN(Grade + CAST(TestNo AS VARCHAR) + Comment),3,50)
    FROM Grades
    GROUP BY Name

    The trick is to prefix the extra columns with the columns for which the aggregate is computed, apply the same aggregate, and then strip off the data on the right. This technique can be used with any GROUP BY query for which additional columns have to be displayed!

    Jorg Jansen

    Jorg Jansen

    Manager Database Development

    Infostradasports.com

    Nieuwegein

    The Netherlands


    Jorg Jansen
    Manager Database Development
    Infostradasports.com
    Nieuwegein
    The Netherlands

  • Thanks, Jorg, for that tip. The sql looks more maintainable than the sub-join method to me. How do you find it does performance wise compared to the self-join method? I'll be testing it on my real database to compare, of course.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply