Distinct Counts

  • Decided that you meant "with the given data" so chose the correct answer.

  • Just a side note, it could be NO if SQL's default collation is not used.

    CREATE TABLE mytable ( myid INT <-------------------

    SELECT COUNT(MyID) <-------------------------------

    SELECT COUNT( DISTINCT MyID) <--------------------

    The select statement will error out if using a case-sensitive collation.

    Luckily I assumed that this question was using the default collation installed.

  • Steve Jones - SSC Editor (1/21/2015)


    They return the same value. Althought COUNT does consider NULL rows, it does so only when the * is used. If an expression is used, then non-null values are counted. Distinct will ignore NULL values as well, so they return the same value.

    I have to disagree with the part in bold. I you run this code, you will notice that it returns NULL in the results.

    CREATE TABLE mytable ( myid INT, mychar VARCHAR(10) );

    GO

    INSERT mytable

    ( myid, mychar )

    VALUES

    ( 1, 'A' ),

    ( 2 , 'B'),

    ( NULL, 'C' ),

    ( 4, 'D' ),

    ( NULL, 'E' ),

    ( 6, 'F' );

    SELECT

    DISTINCT MyID

    FROM

    mytable;

    DROP TABLE mytable;

  • Perhaps not worded well, but this question deals with COUNT(), DISTINCT, and NULLs, not just DISTINCT.

  • jclementz (1/22/2015)


    Steve Jones - SSC Editor (1/21/2015)


    They return the same value. Althought COUNT does consider NULL rows, it does so only when the * is used. If an expression is used, then non-null values are counted. Distinct will ignore NULL values as well, so they return the same value.

    I have to disagree with the part in bold.

    I read that bit as "distinct does not change that".

    Distinct itself does not ignore NULL values (it does treat them all as equal, as per ANSI-standard group by rules - NULL is not equal to anything, not even to NULL, but they are all considered the same for grouping). But the COUNT still ignores the NULL.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • +2 Thanks Steve!

    Andre Ranieri

  • I knew the question was trying to test one's knowledge about COUNT(*) versus COUNT(aSpecificColumn), as I have been caught by this previously.

    The only problem I have with this QoD, is that the INSERT statement fails as the column 'myid' does not allow nulls, as it is treated as a primary key.

    I think the QoD would have got its point across better if the NULL values had been inserted into the NULLable 'mychar' column instead.

  • DTML (1/22/2015)


    The only problem I have with this QoD, is that the INSERT statement fails as the column 'myid' does not allow nulls, as it is treated as a primary key.

    There is no primary key constraint declaration in the QotD, so this column is not treated as a primary key.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (1/22/2015)


    DTML (1/22/2015)


    The only problem I have with this QoD, is that the INSERT statement fails as the column 'myid' does not allow nulls, as it is treated as a primary key.

    There is no primary key constraint declaration in the QotD, so this column is not treated as a primary key.

    My guess is that the database where DTML ran the script has ANSI_NULL_DEFAULT off, thus making columns NOT NULL by default.

  • You are spot on correct jcclementz 🙂

    The database option ANSI_NULL_DEFAULT was set to off which I confirmed with the following:

    SELECT is_ansi_null_default_on

    FROM sys.databases

    WHERE name = 'myDatabaseName'

    I have learned something new yet again from participating in these questions - love it!! I didn't know there was such an option (I am a relatively new to MS SQL).

    I assumed that if you didn't explicitly state a column was PRIMARY KEY as in this QoD's script, that by default MS SQL treated the INT column as the primary key (which is what has been happening with me every time I CREATE TABLE(s) since I have been working with the AdventureWorks database for learning purposes; thus why I tend to *explicitly* set nullability and constraints...etc.). So I thought this QoD was a *trick question*!!

    Hugo Kornelis's reply stumped me (as in my database the column was being 'treated as a primary key') until I saw jcclementz's reply. Sure enough when I altered the database and SET ANSI_NULL_DEFAULT ON, the myid column behaved as Hugo suggested and the INSERT statement works...etc.

    So, I would like to ask the more experienced SQL Server users the following question(s):

    (1) The default for the option ANSI_NULL_DEFAULT is off so, is this option typically changed to be

    on, so that users have to *explicitly* set PKs and nullability?

    (2) Or do most work environments leave this option set to 'off' and users just need to be aware

    of the behavior?

  • nice question... count * should not consider the null values that's why we got an error while using count(distinct *), But when we use count(distinct myid) it consider null also a unique value.

    Manik
    You cannot get to the top by sitting on your bottom.

  • DTML (1/22/2015)


    So, I would like to ask the more experienced SQL Server users the following question(s):

    (1) The default for the option ANSI_NULL_DEFAULT is off so, is this option typically changed to be

    on, so that users have to *explicitly* set PKs and nullability?

    (2) Or do most work environments leave this option set to 'off' and users just need to be aware

    of the behavior?

    1: The ANSI_NULL_DEFAULT option *only* affects nullability, not PKs. Primary Keys must *always* be set explicitly.

    The effect of ANSI-NULL_DEFAULT is actually quite simple - if you create a table and do not specify nullability for a column, this option determines if that column will or will not allow nulls. So you could say that ANSI_NULL_DEFAULT determines whether the default for a new column is NULL or NOT NULL.

    2: I cannot comment on "most work environments", but I do know that all the create table scripts I write and most create table scripts I see explicitly specify NULL or NOT NULL for every column. Being explicit is more clear for future maintenance. An added benefit is that unexpected changes to the ANSI_NULL_DEFAULT setting can never break my scripts if I never rely on a default that may change.

    PS: Please do not make the mistake of thinking that AdventureWorks is a shining example of great database design. It is a database created for the sole purpose of allowing MS to demonstrate all features, so every feature is stuffed in there somewhere - even if it meant applying all the force needed to get that square peg to fit in the round hole.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Very good question again. 🙂

  • Quite a nice question, but I agree with Hugo about the explanation.

    An interesting point is that what the queries return depends on options. If ANSI_DEFAULTS is off, both will return 0 (because the insert fails); if it's on, both will return 4. But ANSI_NULLS on/off doesn't affet the result at all. I'm rather glad that we'll eventually lose that difference (because OFF for ANSI_DEFAULTS will no longer be supported) - at least we will if SQL Server survives that long despite Microsoft's new pricing lunacy.

    Tom

  • Thanks for replying Hugo; however, I am still confused with regards to the Primary Key behavior.

    Perhaps you or someone else can clarify the following for me? Using the 2012 AdventureWorks database:

    a) I ran the CREATE TABLE script as it is in the QoD and the is_ansi_null_default_on was set to 0 (off)

    The table was created, and the myid was created as Not Null and as a PK .

    The INSERT statements failed with Msg 515...column does not allow nulls...etc.

    Both SELECT statements return 0.

    b) Then after these discussions, I set ANSI_NULL_DEFAULT ON (i.e. rechecked that is_ansi_null_default_on now returns 1)

    Re-ran the CREATE TABLE script as it is in the QoD

    The table was created, and the myid was created as allowing Nulls and it was no longer a PK

    The INSERT statements worked with 6 rows being added

    Both SELECT statements return 4 rows.

    So, if it isn't the ANSI_NULL_DEFAULT option that makes the column a PK, is there another option that I am unaware of?

    I totally agree about your comments about explicitly setting NULL, NOT NULL and CONSTRAINTS for future maintenance and to ensure your scripts don't break should an option be changed...etc. Good practice for sure!

    DTML

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply