T-SQL counts

  • Jeff Moden (10/1/2013)


    Actually, this is going to cause a mess for me when we upgrade from 2005 because NULL is considered to be a DISTINCT value. For 2005, the correct answer is "4". For 2008 and above, the correct answer is 3. The 2008+ version works as I would expect because just a simple COUNT(ID) would ignore the NULL. In this particular case, that would still return a 4 because even though the "2" is duplicated, there are 4 non-null values.

    The reason why it's going to make a mess for me is because a lot of the developers used it fully expecting NULL to be considered a DISTINCT value. This is one of those changes in how SQL Server works that could produce some serious undetected problems if you built your code in 2005 and you upgrade later.

    Your developers must be doing smething pretty queer. BOL for 2005 says COUNT(DISTINCT x) returns a count of distinct non-null values. SQL 2000 always returned a count of distinct non-null values for me, and I used it throuhghout its fully supported life and a bit longer, it was the same whether you had enterprise, standard, or MSDE versions and teh same whichever service pack you had. Maybe they are setting some queer options that break it (like ansi nulls off or something? although it would be surprising if that change the behaviour of COUNT). And vv_kirov's test shows that BOL is right for at least for one version of 2005, and Jens-Peter hasthe same with a later 2005 build.

    Tom

  • easy one..

    Thanks Smith..

  • I struck out my previous comments. I don't know what happened last night but I can't seem to duplicate the problem today. I hate it when this happens.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (10/2/2013)


    I struck out my previous comments. I don't know what happened last night but I can't seem to duplicate the problem today. I hate it when this happens.

    I think we have all done that somewhere along the way 😉

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Great question. Almost overlooked the distinct in the count().



    Everything is awesome!

  • Nice and easy - thanks, Samith!

  • didn't need to think twice,,,

    easy one. 🙂

  • Hany Helmy (10/2/2013)


    Easy.

    Ditto.

  • Nice and simple question. Lot to think.. thanks Samith..

  • tom.w.brannon (10/2/2013)


    I believe the problem is a difference in how distinct gets used in different contexts. For the following queries

    select count(distinct id) from #temp_test;

    select distinct id from #temp_test;

    select count(*) from (select distinct id from #temp_test) a;

    the results are 3, 4 rows returned, and 4. So NULL is a distinct value but not counted as a distinct value. Maybe somebody else can answer why this makes sense.

    Good work Tom.. If you use 'select count(id)' in the last select query instead of 'select count(*)' you will get the result as 3. If we don't use the field name in count(distinct) clause, it will not eliminate NULL.

  • Thanks!

  • Jamsheer (10/3/2013)


    tom.w.brannon (10/2/2013)


    I believe the problem is a difference in how distinct gets used in different contexts. For the following queries

    select count(distinct id) from #temp_test;

    select distinct id from #temp_test;

    select count(*) from (select distinct id from #temp_test) a;

    the results are 3, 4 rows returned, and 4. So NULL is a distinct value but not counted as a distinct value. Maybe somebody else can answer why this makes sense.

    Good work Tom.. If you use 'select count(id)' in the last select query instead of 'select count(*)' you will get the result as 3. If we don't use the field name in count(distinct) clause, it will not eliminate NULL.

    Tom, a value of NULL indicates that the value does not exists and is unknown. For the DISTINCT keyword, null values are considered to be duplicates of each other. When DISTINCT is included in a SELECT statement, only one NULL is returned in the results, regardless of how many null values are encountered.Check the following link:

    http://technet.microsoft.com/en-us/library/ms187831(v=sql.105).aspx

    In case of count(DISTINCT), it returns the number of unique non null values. See the following link for count(DISTINCT):

    http://technet.microsoft.com/en-us/library/ms175997.aspx

  • In between, simple and good QOTD. 🙂

  • easy

  • easy one

    Distinct uses null values normally

    count() eliminates them as well as all the other arrgegates functions

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply