Ignore the ones which are NULL?

  • Hi,

    I have this stored procedure where i can pass several parameters and those can be passed as a null.

    I found this code below and trying to understand what this code is doing exactly:

    AND (@CodeId IS NULL OR @CodeId = 0 OR p.CodeId = @CodeId)

    Can someone explain this to me?

    Also is there a better way to do this?

    Thanks

    EXEC dbo.xx_StoredProcedure @PartId = NULL, @CodeId = NULL, @Active = 1

    CREATE PROCEDURE [dbo].[xx_StoredProcedure]

    @PartId int = NULL,

    @CodeId int = NULL,

    @Active bit = NULL

    AS

    BEGIN

    SET NOCOUNT ON

    DECLARE @CurrentDate DATETIME

    SET @CurrentDate = getdate()

    SELECT ...

    FROM dbo.Part p

    INNER JOIN dbo.PartName pn (NOLOCK)

    ON p.PartId = pn.PartID

    WHERE p.PartId = ISNULL(@PartId, p.PartId)

    AND (@CodeId IS NULL OR @CodeId = 0 OR p.CodeId = @CodeId)

    AND p.Active = ISNULL(@Active, p.Active)

  • AND (@CodeId IS NULL OR @CodeId = 0 OR p.CodeId = @CodeId)

    Let's begin with this statement in the WHERE Clause. It will be helpful to interpret it from right to left.

    p.CodeId = @CodeId

    This obviously means to select any rows where the CodeID in the table is the same as the @CodeID parameter. So if we specify a CodeID when we execute the stored proc, only those rows with matching values in the [codeID] column will be returned.

    @CodeId = 0

    Maybe we want to select ALL rows. Zero (0) is the way we signal the proc that we don't care about the values in the [codeID] column. If the parameter is zero, every row will pass this test.

    @CodeId IS NULL

    The procedure was written to allow nulls to be passed for the @codeID parameter. Oddly, it makes NULL the default as well. This is redundant, since the parameter is NULL or we wouldn't be using the default value. I would have chosen to default to 0.

    NULLs have to be tested separately because a NULL is not really a value in a column or variable, it is more like the status of a column or variable. Internally, SQL has a bitmap for every nullable column in a row, to designate whether or not that column is currently in a NULL status. What does this mean when coding SQL? Run the following:

    declare @examples table (A int, B int)

    insert into @examples

    select 0,0 union all

    select 1,0 union all

    select null, 0 union all

    select null, null

    You should get these results:

    [font="Courier New"]

    A B result

    PoorPoorEqual

    RichPoorNot Equal

    NULLPoorNot Equal

    NULLNULLNot Equal

    [/font]

    Interesting isn't it? NULL is never "equal" to anything, not even another NULL. Remember that NULL is a status: the absence of a value that can be tested for equality. That's why the comparison is

    @codeID IS null

    instead of

    @codeID = null

    So, the query in your procedure will return all rows regardless of the value in the [codeID] column, unless a value other than 0 is entered through the @codeID parameter. You may ask why they didn't use isnull(@codeID,0) ?? The reason is that comparing columns to functions often prevents the optimizer from making use of indexes, resulting in an undesirable table scan.

    My preference would have been to make the default value for @codeID = 0 (zero) instead of NULL.

    By the way, defaults only come into play if the parameter is used when the procedure is called. If your application expressly passes a NULL through that parameter, it will get through and not be overriden by the default. For this reason, I often start out procedures with statements like this:

    set @codeID = isnull(@codeID,0)

    So, to sum up, I would make sure on the front end that any null in the @codeID parameter was replaced with a zero, and in the query use this in the WHERE clause:

    AND @CodeId IN (0,p.CodeId)

    Hope this helps. Let me know if you have any questions. By the way, I freetyped my changes below, so it is untested and may have typos.

    CREATE PROCEDURE [dbo].[xx_StoredProcedure]

    @PartId int = 0,

    @CodeId int = 0,

    @Active bit = NULL -- why a bit? it takes a least a byte to store 1-8 bits

    AS

    BEGIN

    SET NOCOUNT ON

    DECLARE @CurrentDate DATETIME

    SELECT @CurrentDate = getdate(), @codeID = isnull(@codeID,0), @partID=isnull(@partID,0)

    SELECT ...

    FROM dbo.Part p

    INNER JOIN dbo.PartName pn (NOLOCK)

    ON p.PartId = pn.PartID

    WHERE @PartID in (0,p.PartID) -- zero means show all

    AND @CodeID in (0,p.codeID) -- zero means show all

    AND @Active is NULL or @Active = p.Active

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thank you so much!

  • You're very welcome.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • One other VERY IMPORTANT point that I almost forgot.

    The first time you run this procedure, the parameters you pass may have an effect on the execution plan.

    After ANY modifications, make sure that your first test of a procedure has values other than NULL or 0 in all of the parameters. Preferably you should use the values that you expect to be most commonly passed.

    Put the execute statement with those values in comments somewhere in your source code to remind you to do this, like this...

    CREATE PROC dbo.MyProc

    .

    .

    .

    END

    /* test

    exec dbo.MyProc 3,9,1

    */

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Excellent information. Thank you for your time.

  • INSERT INTO Discussion SELECT $0.02 WHERE account_name = SUSER_SNAME(): 😀

    Bob Hovious (4/3/2009)


    The procedure was written to allow nulls to be passed for the @codeID parameter. Oddly, it makes NULL the default as well. This is redundant, since the parameter is NULL or we wouldn't be using the default value. I would have chosen to default to 0.

    Procedures always allow an explicit NULL to be passed as the value for a parameter. There is no way to disable this.

    There is nothing 'odd' about having a default parameter value of NULL - this simply allows the procedure to be called without specifying a value for the parameter, in which case it will have the value NULL inside the procedure code.

    If you try calling a procedure omitting a parameter which doesn't have a default value (NULL or otherwise), you will get an error message stating that a value is required.

    Many callers will expect the same results from a procedure whether they omit an optional parameter, explicitly pass a NULL (which makes sense given that NULL means an unknown or missing value), or (for an integer parameter) passing the 'magic number' zero.

    For some, passing zero is bad practice, since there is no guarantee that zero is not (and never will be) a valid value. There is nothing to say, for example, that part id zero and part code zero cannot exist. Myself included, many people prefer the use of NULL to signify that a parameter's value is not important.

    Bob Hovious (4/3/2009)


    You may ask why they didn't use isnull(@codeID,0) ?? The reason is that comparing columns to functions often prevents the optimizer from making use of indexes, resulting in an undesirable table scan.

    Calling a function on a constant values or a local variable does not prevent the use of an index. This is logical since no data column is included in the expression.

    The code you posted results in a full scan by the way.

    Bob Hovious (4/3/2009)


    By the way, defaults only come into play if the parameter is used when the procedure is called.

    You omitted the vital word 'not' here. Defaults are used when a parameter is omitted as mentioned previously.

    Bob Hovious (4/3/2009)


    So, to sum up, I would make sure on the front end that any null in the @codeID parameter was replaced with a zero...

    This is contrary to what you said about explicitly replacing a NULL value with a zero inside the procedure. It is good practice to be tolerant of any parameter value that might be passed in. Enforcing a no-NULL policy in the application fails when the procedure is called from anywhere else which does not have that restriction built-in. We have no way to prevent NULL values in parameters, so it makes sense to handle them gracefully within the procedure. My preference is still to handle NULLs explicitly, rather than replacing them with a magic number.

    Bob Hovious (4/3/2009)


    @Active bit = NULL -- why a bit? it takes a least a byte to store 1-8 bits

    Presumably, this parameter is defined as a BIT in the database because it can have only two values - true or false (I am assuming it is also specified as NOT NULL).

    If other BIT columns are present in the table, the storage engine will combine up to eight of them in a single byte of storage.

    Quite aside from anything else, using a BIT explicitly says that this is an 'on/off' switch. Using any other datatype adds no value from a storage point of view, and would allow inappropriate values, unless a CHECK constraint is present. Using a TINYINT for example, would use one byte always and allow a value of 143 :blink:

    Bob Hovious (4/3/2009)


    WHERE @PartID in (0,p.PartID) -- zero means show all

    AND @CodeID in (0,p.codeID) -- zero means show all

    AND @Active is NULL or @Active = p.Active

    As mentioned, this arrangement results in a full scan (before or after the missing precedence parentheses are added). In a parts table with 50,000 entries this resulted in 338 logical reads.

    A better solution based on taking advantage of the embedded parameters to add start-up filters to the query plan uses 4 logical reads. The same table with the same indexes in both cases, obviously.

    Bob Hovious (4/3/2009)


    The first time you run this procedure, the parameters you pass may have an effect on the execution plan.

    This is called parameter-sniffing. The usual remedy is to use an OPTIMIZE FOR hint, a RECOMPILE hint, or to store the parameter values in local variables, which will result in the QO choosing a plan based on the average distribution of values.

    Regards,

    Paul

    SET STATISTICS IO OFF

    IF OBJECT_ID(N'dbo.Part', N'U') IS NOT NULL DROP TABLE dbo.Part;

    CREATE TABLE dbo.Part

    (

    part_idINT IDENTITY(1,1) NOT NULL

    CONSTRAINT [PKC dbo.Part part_id]

    PRIMARY KEY CLUSTERED WITH (FILLFACTOR = 100),

    part_codeINT NOT NULL,

    part_descVARCHAR(50) NOT NULL

    CONSTRAINT [CK dbo.Part part_desc not blank]

    CHECK (LEN(part_desc) > 0),

    is_activeBIT NOT NULL,

    );

    GO

    SETNOCOUNT ON;

    INSERTdbo.Part

    (

    part_code,

    part_desc,

    is_active

    )

    SELECTRAND(CHECKSUM(NEWID())) * 2147483647,

    REPLACE(NEWID(), '-', ''),

    CONVERT(BIT, CASE WHEN RAND(CHECKSUM(NEWID())) > = 0.15 THEN 1 ELSE 0 END)

    GO 50000

    CREATE NONCLUSTERED INDEX [IX dbo.Part part_code (is_active)]

    ON dbo.Part (part_code) INCLUDE (is_active)

    WITH (FILLFACTOR = 90);

    SET STATISTICS IO ON;

    DECLARE@part_id INT,

    @part_code INT,

    @is_active BIT;

    -- Test values, replace with examples from the table as required

    SELECT@part_id = NULL,

    @part_code = 1682721149,

    @is_active = 1

    -- Matches based on @part_id, if it was specified

    SELECTpart_id,

    part_code,

    part_desc,

    is_active

    FROMdbo.Part

    WHERE(

    NULLIF(@part_id, 0) IS NOT NULL-- Start-up filter expression

    ANDpart_id = @part_id

    AND(@is_active IS NULL OR is_active = @is_active)

    )

    OR

    (

    @part_code IS NOT NULL-- Start-up filter expression

    ANDpart_code = @part_code

    AND(@is_active IS NULL OR is_active = @is_active)

    )

    OPTION(RECOMPILE);

    -- Convert NULLs to magic zeroes for the alternative query

    SELECT@part_id = ISNULL(@part_id, 0), @part_code = ISNULL(@part_code, 0);

    -- FUll scan

    SELECTpart_id,

    part_code,

    part_desc,

    is_active

    FROMdbo.Part

    WHERE@part_id in (0,part_id) --- zero means show all

    AND@part_code in (0,part_code) --- zero means show all

    AND(@is_active is NULL or @is_active = is_active);

    Table 'Part'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Part'. Scan count 1, logical reads 338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  • Trans54, me see if I can reply in pretty much the same order.

    1. Paul is correct that an explicit null can always be passed, and so there is usually a place early in the procedure where values are assigned to null fields. That's why the following line is in the sample code I posted.

    SELECT @CurrentDate = getdate(), @codeID = isnull(@codeID,0), @partID=isnull(@partID,0)

    However, in your example, the clear intent was for NULL or 0 to return all values. In these situations, my preference is to leave that as required parameter. IMHO it is a mistake to allow default values that potentially return large sets. Yes zero could hypothetically be a valid value in some other case, but not in the example under discussion. Finally, as for Paul's comment about using nulls for parameters which aren't important, in this it is very important as the parameters mean the difference between rows for a single value and rows for all values being returned.

    2. What Paul says about testing local variables makes sense to me, but I've not read that before. Hopefully he will come back and can point me towards some reading.

    3. Paul is ABSOLUTELY right. I did omit the word "not" and it IS vital. This was my error, and an atrocious failure to proofread before sending. My apologies.

    4. MOST IMPORTANTLY, Paul is correct that the code I sent you does result in a table scan. That's what I get for posting late at night and not testing what I post. My apologies again, and thanks to Paul for setting it straight. Look hard at his solution, because it will run much faster given a similar indexing scheme.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Oh, one more thing about the bit datatype. I rarely see a simple active yes/no. Usually it is a "status" column and (A)ctive is but one status out of several. When I see a single bit column, I think one day a conversion will have to be done.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (4/5/2009)


    Oh, one more thing about the bit datatype. I rarely see a simple active yes/no. Usually it is a "status" column and (A)ctive is but one status out of several. When I see a single bit column, I think one day a conversion will have to be done.

    Quite right Bob, and I wholeheartedly agree - if a column is anything except a logical true/false it should not be a BIT. Anything status-y should definitely gone in something like one of the INT types, or a CHAR, for example.

    I'm afraid I couldn't quite discern which part of my previous rambling you were asking for more information on - I'll be happy to help if you could give me a bit more on that 🙂

    Paul

  • Paul, I was referring to your comments about functions which are applied to constants or local variables not having an effect on the use of indexes. (Sorry for the lack of clarity, that will teach me to REPLY late in the evening.) As I said, that makes perfect sense to me, but I'd like to do some more reading.

    My sincere thanks for stepping in and cleaning this up. My new resolution is to never post anything of substance if I'm too tired to either test the solution or take time to proofread.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (4/6/2009)


    My sincere thanks for stepping in and cleaning this up. My new resolution is to never post anything of substance if I'm too tired to either test the solution or take time to proofread.

    Hey Bob,

    I wish I could say I never posted anything of substance without testing or proof-reading! :blush:

    So don't concern yourself with that.

    I'm afraid I can't point you to anything explicit concerning functions operating on constants or local variables. My assertion is solely based on experience, and is limited to scalar functions with do no data access, which I should have made clear. The idea is that such a function wouldn't affect the table access order or access strategy because it evaluates to a scalar value. As such it should appear in the query plan as a Compute Scalar operation.

    In fact, the more I think about it, it is fairly easy to find examples where a function (even a scalar with no data access) affects the access strategy.

    One reason is that the QO cannot predict the return value from a function, so it must use a 'guess' based on a rule rather than statistics when deciding how many rows will match the function value. A short example:

    -- create a test database

    create database [A0C0CAFA-BCF0-4E17-A1BF-21CCFF2A073D];

    go

    -- switch to it

    use [A0C0CAFA-BCF0-4E17-A1BF-21CCFF2A073D];

    go

    -- a simple function to double an int

    create function dbo.f (@i int) returns integer with schemabinding, returns null on null input as begin return @i * 2 end;

    go

    -- a test table

    create table dbo.t(i int identity(1,1) primary key nonclustered, v int not null);

    go

    -- insert 50K random values into column 'v'

    set nocount on

    insert dbo.t(v) select convert(int, rand(checksum(newid())) * 1000000);

    go 50000

    -- create a second non-clustered index

    create nonclustered index i on dbo.t(v);

    go

    -- make sure the QO has good statistics available

    update statistics dbo.t with fullscan;

    -- no function => seek on index i with RID lookup (expecting 2.09671 rows from the index seek on i

    select count(*) from dbo.t where i between 1000 and 1100 and v between 5000 and 5001 option (recompile);

    -- function (which returns the same value) => hash join on the PK and index i (expecting 78.3494 rows from the index seek on i)

    select count(*) from dbo.t where i between 1000 and 5000 and v between dbo.f(2500) and 5001 option (recompile);

    go

    -- tidy up

    drop database [A0C0CAFA-BCF0-4E17-A1BF-21CCFF2A073D];

    As you can see, the QO makes a guess (a pretty bad one at that) in the example with the function.

    There are plenty of other examples of functions affecting the plan, including the obvious case where a function does data access. Another effect of all scalar T-SQL UDFs (and scalar CLR UDFs which do data access) is that the QO cannot use parallelism anywhere in the query plan.

    This will obviously affect the plan, and possibly the access strategy, if a parallel plan would otherwise be selected. For TVFs, the query plan will have a non-parallelizable 'island' around the TVF (though parallelism may exist in the rest of the plan) - this could also clearly have an effect.

    Persisted computed columns based on a UDF are another example where the plan may well change.

    What I should have said was that in the particular case presented the scalar function on a constant or local variable would not affect the query plan. Though, as you have seen, that would not have been the case if the function affected the availability of good statistics to the QO.

    So - there you go. Perhaps I should have tested more, and proof-read better too eh? 😀

    Cheers,

    Paul

  • No complaints from me, Paul. I only hope Trans54 came back to read your first post.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks guys, i really appreciate your inputs. Definatelly a lot to learn from here.

  • Paul/Bob -

    With you two posting minor articles for responses, there's no way to adequately quote anything. You pretty much covered the big topics. On the remaining topic of how a function might interfere with index handling: Even if the function returns a scalar, a UDF can pose a lot of challenges to the query engine, which may mean it is not capable of discerning whether a UDF is deterministic or not. Knowing whether something is deterministic is a big deal with UDF's since it can result in the outer query being slowed down a LOT when the query engine decides evaluate the function per row (if it doesn't think it's deterministic). The solution there is to force the compiler to treat deterministic UDFs as deterministic by specifying them to be schema-binding.

    There was a BIG discussion on this about a year ago in a thread started by Brandie Tarvin if you want to see the testing plans, etc... confirming all of this.

    So - you might get lucky if you don't, but chances are you will end up getting bitten when the compiler changes its mind in the future.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply