Four Rules for NULLs

  • ...and so is WHERE <>...

    --===== Now, do a WHERE <> and check the execution plan... INDEX SEEK!!!

    SELECT * FROM jbmtest WHERE RowNum <> 3

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Matt Miller (3/28/2008)


    Jeff Moden (3/28/2008)


    Good article... and I realize it's a couple years old...

    I normally don't work NULLs against "TRUE/FALSE" values even in WHERE clauses... I normally work them against some form of data... so, here's what my truth table looks like...

    the thing though is that the handling of the value of criteria is handled differently in a "binary operation" OR like you showed here, than it is when use in the OR statement of a query.

    For example:

    select top(1) *

    from Tally

    where

    (1=1) --TRUE

    OR

    (1=NULL) --unknown

    the query returns a row, since the first predicate is TRUE and is sufficient in making the OVERALL condition TRUE.

    You will note that this is different from the "truth table" in the binary operation. There's a bit of confusion because the two scenarios are a little different. Meaning, BINARY OR is not the equivalent of CLAUSE OR (so to speak.)

    Also - note that in the tri-state logic you describe, as apply to a WHERE clause, rows are returned only when the OVERALL operation evaluates to TRUE. So - I'm not sure that has the same bearing on AND in a WHERE clause, since both FALSE and UNKNOWN are excluded using WHERE.

    Bad test, I think, Matt. You would never use 1 = NULL... not sure how that applies.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Gregg Walker (3/28/2008)


    Very nice article Michael on a topic where the details really count.

    I had never heard of the COALESCE function before. We use Informix as our primary database and it does not have a COALESCE function. Informix has a function NVL which works the same as ISNULL and COALESCE. It seems strange that IBM would not have provided an implementation for Informix yet. Do you happen to know if I'm missing something?

    --

    Gregg Walker

    Not sure about Informix syntax, but COALESCE is an ANSI/ISO standard function. IBM is usually pretty good about adhering to the standards. If you wanted to simulate COALESCE functionality using ANSI/ISO standard syntax you could use a CASE expression (COALESCE is defined as functionally equivalent):

    CASE WHEN col1 IS NOT NULL THEN col1

    WHEN col2 IS NOT NULL THEN col2

    WHEN col3 IS NOT NULL THEN col3

    ELSE NULL END

  • Having just gone through 11 (enlightening) pages of this thread, I hope I just have not missed the answer to my question:

    Since nulls will group together in a group by statement, is it reasonable to use IS NULL in a having statement?

  • Thanks Michael. I will stick with NVL for now as CASE is more verbose and is not easy to do a find and replace on. I will inquire of IBM as for their direction with COALESCE.

    --

    Gregg Walker

  • Jeff Moden (3/28/2008)


    Bad test, I think, Matt. You would never use 1 = NULL... not sure how that applies.

    I'm drawing a comparison (or rather - highlighting the difference) between that (which represents what the WHERE clause would do), and your binary operator. There's a difference for sure, but it's subtle enough that it's being missed (I was pointing that out).

    Meaning your @T|@N behavior as a binary operator yields NULL (since that's a scalar operation), but two criteria connected in a WHERE CLAUSE with an OR, one of which is TRUE and one which yields UNKNOWN, yield TRUE, not NULL or UNKNOWN.

    In other words - defining exactly which "truth table" we're after defines the results you get. In essence - making sure that the "NULL is not the same as UNKNOWN" part is clear.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (3/28/2008)


    Jeff Moden (3/28/2008)


    Bad test, I think, Matt. You would never use 1 = NULL... not sure how that applies.

    I'm drawing a comparison (or rather - highlighting the difference) between that (which represents what the WHERE clause would do), and your binary operator. There's a difference for sure, but it's subtle enough that it's being missed (I was pointing that out).

    Meaning your @T|@N behavior as a binary operator yields NULL (since that's a scalar operation), but two criteria connected in a WHERE CLAUSE with an OR, one of which is TRUE and one which yields UNKNOWN, yield TRUE, not NULL or UNKNOWN.

    In other words - defining exactly which "truth table" we're after defines the results you get. In essence - making sure that the "NULL is not the same as UNKNOWN" part is clear.

    Ah... understood. Thanks, Matt. Guess I'll have to try a slightly different thing based on this. You know me... I gotta find out if that truth table in the article is spot on... 😉 thanks again.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Mike C (3/28/2008)


    Would I be willing to concede that the statement "NULL is not equal to any other value, or other NULLs" is incorrect? Absolutely if the opposite can be demonstrated: i.e., it can be shown that NULL is equal to at least one other value, or another NULL.

    Given a certain reasonable interpretation of the former statement, it is only in 2VL (boolean world) that the latter statement is necessary to show that the former statement is incorrect.

    In 3VL, either of the following would contradict (a reasonable interpretation of) the quoted statement above: (a) "NULL is equal to something" or (b) "It is unknown whether NULL is equal to something". (b) can easily be shown: (1 = NULL) yields Unknown.

    The point being that again the disagreement boils down to interpretation of a natural language (not SQL) statement "NULL is not equal to any other value, or other NULLs" (or a similar variation). There's no disagreement here about the semantics of SQL.

    My point is not that your statement per se is incorrect, but that it is ambiguous, and that one of the possible interpretations (a reasonable and straightforward interpretation) is incorrect. In my opinion, the most straightforward interpretation is the one that turns out to be wrong. That's a subjective point. Certainly in light of the rest of the article, one can infer which meaning was intended. It would be nice though, in an article describing twisty little logic passages, if the intended meaning was clear without relying so much on context.

  • Mike C (2/25/2006)


    NULL is not a value...

    Spot on! Hence the T-Shirt that says "NULL Is NOT Nothing!" 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • kaspencer (3/28/2008)


    However, I do not consider the NULL to be "missing data". The NULL must simply be considered as either "no value", or "value not [yet] assigned".

    Heh... what the heck is the difference between "missing value" and "value not yet assigned?" Unless it's just a matter of semantics, I'd say nothing. And, to me anyway... 0 is "no value" (ie: nothing). Empty string is "no value" (ie: nothing).

    Null is NOT nothing. Heh... the T-Shirt says so... 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jim Russell (3/28/2008)


    Having just gone through 11 (enlightening) pages of this thread, I hope I just have not missed the answer to my question:

    Since nulls will group together in a group by statement, is it reasonable to use IS NULL in a having statement?

    You should do the same thing that I'd have to do to answer that, Jim... try it 😀 Not sure I'd take anyone's word on such a thing...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Gregg Walker (3/28/2008)


    Thanks Michael. I will stick with NVL for now as CASE is more verbose and is not easy to do a find and replace on. I will inquire of IBM as for their direction with COALESCE.

    --

    Gregg Walker

    Not sure about DB2 and I'm not sure which RDBMS you're using Greg (didn't look at you previous threads), but COALESCE also works in place of NVL in Oracle.

    EDIT... sorry... just saw that you're using Informix...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (3/28/2008)


    Mike C (2/27/2006)


    Both of which, by the way, are non-SARGable in WHERE clauses...

    ??? :blink: Maybe I'm reading this wrong, but WHERE NOT IN most certainly IS sargeble...

    --drop table jbmtest

    --===== Create and populate a 1000 row test table.

    -- Column "RowNum" has a range of 1 to 1000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD CONSTRAINT PK_JBMTest_RowNum PRIMARY KEY CLUSTERED (RowNum)

    --===== Now, do a WHERE NOT IN and check the execution plan... INDEX SEEK!!!

    SELECT * FROM jbmtest WHERE RowNum NOT IN (1,2,3)

    I get a clustered index scan when I run your code. The optimizer expands it out to RowNum <> 1 AND RowNum <> 2 AND RowNum <> 3.

  • Jim Russell (3/28/2008)


    Having just gone through 11 (enlightening) pages of this thread, I hope I just have not missed the answer to my question:

    Since nulls will group together in a group by statement, is it reasonable to use IS NULL in a having statement?

    I imagine you could, but I'm hard-pressed to find a reasonable purpose. HAVING is designed to allow grouping by aggregates, and most aggregate functions only generate NULL in very special circumstances; mostly they discard NULLs.

  • Lars Huttar (3/28/2008)


    Mike C (3/28/2008)


    Would I be willing to concede that the statement "NULL is not equal to any other value, or other NULLs" is incorrect? Absolutely if the opposite can be demonstrated: i.e., it can be shown that NULL is equal to at least one other value, or another NULL.

    Given a certain reasonable interpretation of the former statement, it is only in 2VL (boolean world) that the latter statement is necessary to show that the former statement is incorrect.

    In 3VL, either of the following would contradict (a reasonable interpretation of) the quoted statement above: (a) "NULL is equal to something" or (b) "It is unknown whether NULL is equal to something". (b) can easily be shown: (1 = NULL) yields Unknown.

    The point being that again the disagreement boils down to interpretation of a natural language (not SQL) statement "NULL is not equal to any other value, or other NULLs" (or a similar variation). There's no disagreement here about the semantics of SQL.

    My point is not that your statement per se is incorrect, but that it is ambiguous, and that one of the possible interpretations (a reasonable and straightforward interpretation) is incorrect. In my opinion, the most straightforward interpretation is the one that turns out to be wrong. That's a subjective point. Certainly in light of the rest of the article, one can infer which meaning was intended. It would be nice though, in an article describing twisty little logic passages, if the intended meaning was clear without relying so much on context.

    I'll agree there are probably a near-infinite number of good ways to phrase the distinction between NULL and values. I suppose you could consider the rest of the article an inline "footnote" attempting to further refine the statement about NULLs not being equal to values 🙂

Viewing 15 posts - 106 through 120 (of 152 total)

You must be logged in to reply to this topic. Login to reply