Inequality!

  • Hugo Kornelis (7/6/2009)


    I see that you now have made a small change to the explanation, but I still do not like it.

    "It is sometimes a better practice to check for equality in conditions rather than inequality."

    Hey Hugo,

    Would you agree that 'positive' predicates (like '=', IN, EXISTS) generally optimize as well or better than the logically equivalent 'negative'?

    I think it is possible to over-analyse the point - it does after all say "sometimes a better practice" and Tao just posted some good examples.

    Sometimes the QotD deserves criticism, but not today I think πŸ™‚

    Cheers,

    Paul

  • One thing to note is that the optimizer is more intelligent than in previous versions and of course they are still improving it. So if the optimizer thinks that the selecivity of a certain condition is very high, it will probably use a corresponding index if it exists.

    Attached is a SQL Plan which shows such an example.

    Of course I am comparing apples with bananas when comparing Qry1 and Qry2 (or Qry3 and Qry4) since they return different things, but the purpose is only to show that both the equality and inequality operator can lead to index usage.

    Whats happening behind the scenses is that SQL Server applies loss-less Transformations of the query where possible that enable an efficient access path. With every update of SQL Server this probably evolves to a more sophisitcated process.

    Here is the query:

    USE master

    GO

    DROP DATABASE TestDB;

    GO

    CREATE DATABASE TestDB

    GO

    USE TestDB

    CREATE TABLE dbo.Test1 (

    ID int identity NOT NULL PRIMARY KEY CLUSTERED

    ,MyText char(500) NOT NULL

    ,MyNum decimal(30,25) NOT NULL

    );

    CREATE INDEX IDX_MyNum ON dbo.Test1 (MyNum)

    CREATE INDEX IDX_MyText ON dbo.Test1 (MyText)

    TRUNCATE TABLE dbo.Test1

    DECLARE @RowCount bigint = 0;

    WHILE 1=1 BEGIN

    SET @RowCount += 1;

    INSERT INTO dbo.Test1 VALUES ('Test', RAND((CAST(GETDATE() AS decimal(30,25)) - FLOOR(CAST(GETDATE() AS decimal(20,10)))) * POWER(10,9)));

    IF @RowCount = 10000 BREAK;

    END;

    GO

    SET showplan_xml ON

    GO

    SELECT * FROM dbo.Test1 WHERE MyText 'Test';-- Qry1

    SELECT * FROM dbo.Test1 WHERE MyText = 'Test';-- Qry2

    SELECT * FROM dbo.Test1 WHERE MyNum 0.0234172695179826000000000;-- Qry3

    SELECT * FROM dbo.Test1 WHERE MyNum = 0.0234172695179826000000000;-- Qry4

    GO

    SET showplan_xml OFF

    GO

    Best Regards,

    Chris BΓΌttner

  • Paul White (7/6/2009)


    Hugo Kornelis (7/6/2009)


    I see that you now have made a small change to the explanation, but I still do not like it.

    "It is sometimes a better practice to check for equality in conditions rather than inequality."

    Hey Hugo,

    Would you agree that 'positive' predicates (like '=', IN, EXISTS) generally optimize as well or better than the logically equivalent 'negative'?

    I think it is possible to over-analyse the point - it does after all say "sometimes a better practice" and Tao just posted some good examples.

    Sometimes the QotD deserves criticism, but not today I think πŸ™‚

    Cheers,

    Paul

    Hi Paul,

    In the case of today's question, it was merely a comparison between a variable and a hard-coded value, so the optimizer doesn't even come into play.

    In queries, there really is no such thing as replacing an inequality with an equality. If I want a report of all orders placed by Texans, I'll need an equality predicate on statecode and I can not transform it to an inequality without changing the meaning of the query and getting wrong results. Likewise, if I need a report of all sales that were not closed on a saturday, I need an inequality on DATEPART(wd, SaleDate) and trasforming that to an equality will change the results.

    So the question which one is better in predicates is actually moot. And in simple IF statements, any arguments involving the optimizer are moot.

    Edit: Afterthought 1: Tao's examples are not great at all. The 1 predicate can match many more rows than the IN(2,3) predicate, so one should expect different query plans. The one to use depends on the requirements.

    Afterthought 2: I agree that today's QotD doesn't deserver criticism. But its explanation does.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo,

    I don't want to get bogged down here: my point is that operations which remain correct while avoiding a ! or a NOT are generally a better choice than the 'negative' counterpart. I think it is wrong to limit the discussion to = versus != even if that was the example in the question: the general point has merit.

    Inequality comparisons can frequently be replaced by a UNION (ALL) of the other valid values. If the valid set is constrained, this can be an efficient way to rewrite an inequality, NOT IN, NOT EXISTS and so on.

    Paul

  • Never satisfied, are we Hugo?

    I have changed it to "B is returned. A simple change that would be better: ..."

  • Steve Jones - Editor (7/6/2009)


    I have changed it to "B is returned. A simple change that would be better: ..."

    No that's not right at all...what you should do...

    πŸ˜€ heh πŸ˜€

    Just kidding Steve!

    Paul

  • Steve Jones - Editor (7/6/2009)


    Never satisfied, are we Hugo?

    Never. There's always room for improvement. πŸ˜‰

    I have changed it to "B is returned. A simple change that would be better: ..."

    I still miss the explanation why this is better. But since I just was notificatied that my two submissions will be run as future QotD, I'd better not push this any further - I might just reap what I sow... :w00t:


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (7/6/2009)


    Edit: Afterthought 1: Tao's examples are not great at all. The 1 predicate can match many more rows than the IN(2,3) predicate, so one should expect different query plans. The one to use depends on the requirements.

    Hmm, I feel unfairly put upon πŸ™‚

    My (initial) examples showed logically equivalent pairs of queries, given the provided sample dataset. Obviously it is not possible to write an "IN()" clause that generally replaces an inequality for all possible datasets (except in the case of very small datatypes like BIT).

    I am trying to understand / evaluate Steve's apparent assertion that inequalities can cause substantially less efficient query plans - for which I need to assume that logically equivalent alternatives exist for the target dataset (otherwise, they are less efficient than what?).

    I'm still very interested to see whether anyone can tell me more about the difference between the following three cases though:

    - WHERE (Type > 5 AND Type 5 AND Type < 7)

    - WHERE (Type 7)

    - WHERE (Type 7) AND (Type 4)

    As far as I can tell the query optimizer consistently performs a straight "seek" for the first and second cases, but not the third, even though the third is actually more restrictive than the second.

    Sample SQL to test these three:

    CREATE TABLE ZZ_DummyTable1 (

    RecordID INT NOT NULL,

    Type INT NOT NULL,

    Data NVarChar(255),

    CONSTRAINT PK_ZZ_DummyTable1 PRIMARY KEY CLUSTERED (

    RecordID

    )

    )

    GO

    CREATE INDEX IX_ZZ_DummyTable1_Type

    ON ZZ_DummyTable1 (Type)

    GO

    INSERT INTO ZZ_DummyTable1 SELECT 1, 1, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 2, 2, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 3, 3, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 4, 4, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 5, 5, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 6, 6, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 7, 7, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 8, 8, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 9, 9, 'This is just padding really, nothing useful to add...'

    INSERT INTO ZZ_DummyTable1 SELECT 10, 10, 'This is just padding really, nothing useful to add...'

    --Compare three logically different queries for index SEEK use, vs WHERE filtering

    -- Third case, equivalent to multiple inequalities, does not do straight SEEK in SQL 2000

    SELECT Count(*) FROM ZZ_DummyTable1 WHERE (Type > 5 AND Type 5 AND Type < 7)

    SELECT Count(*) FROM ZZ_DummyTable1 WHERE (Type 7)

    SELECT Count(*) FROM ZZ_DummyTable1 WHERE (Type 7) AND (Type 4)

    GO

    DROP TABLE ZZ_DummyTable1

    GO

    I have now tested on SQL Server 2005, and found that SQL 2005 correct this - the SQL 2005 engine handles multiple outer ranges / multiple inequalities with ease! (the third query, like the previous two, fully addresses the record selection in the SEEK portion of the index seek)

    So, the follow-up question: can anyone think of or provide any cases/examples where inequalities might perform worse than other logically equivalent queries, for any given limited dataset, in SQL server 2005 and later?

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • Tao Klerks (7/7/2009)


    Hugo Kornelis (7/6/2009)


    Edit: Afterthought 1: Tao's examples are not great at all. The 1 predicate can match many more rows than the IN(2,3) predicate, so one should expect different query plans. The one to use depends on the requirements.

    Hmm, I feel unfairly put upon πŸ™‚

    I'm sorry, that was not my intention. But when I saw that the message I did intend to criticise used your examples to illustrate an incorrect point, I had to say why these examples are (IMO) irrelevant to the discussion.

    My (initial) examples showed logically equivalent pairs of queries, given the provided sample dataset. Obviously it is not possible to write an "IN()" clause that generally replaces an inequality for all possible datasets (except in the case of very small datatypes like BIT).

    Or columns constrained by a trusteed CHECK constraint.

    The optimizer doesn't know what data actually is in the column, and even less what data might be in the column if the exection plan gets reused at a later time. So it's no surprise that the optimizer will not treat two queries as identical if they are only identical in the light of whatever happens to be in the table right now but not identical in terms of what might be in the table

    I am trying to understand / evaluate Steve's apparent assertion that inequalities can cause substantially less efficient query plans - for which I need to assume that logically equivalent alternatives exist for the target dataset (otherwise, they are less efficient than what?).

    We're in the same boat. But my strategy is different. You apprently try to understand by trying all kinds of stuff. I try by challenging Steve and repeating that, as long as he doesn't state WHY one would be better than the other, the explanation doesn't really explain anything.

    I'm still very interested to see whether anyone can tell me more about the difference between the following three cases though:

    - WHERE (Type > 5 AND Type 5 AND Type < 7)

    - WHERE (Type 7)

    - WHERE (Type 7) AND (Type 4)

    Im not sure if you intended for the first case to repeat the exact same requirement twice.

    As far as I can tell the query optimizer consistently performs a straight "seek" for the first and second cases, but not the third, even though the third is actually more restrictive than the second.

    (...)

    I have now tested on SQL Server 2005, and found that SQL 2005 correct this - the SQL 2005 engine handles multiple outer ranges / multiple inequalities with ease! (the third query, like the previous two, fully addresses the record selection in the SEEK portion of the index seek)

    I don't have SQL2000 installed anymore, so I can't play with it myself. But my guess would be that the third case, though more restrictive, is also more complex. Maybe you hit upon a limitation of the optimizer in SQL 2000. Or maybe the DB engine itself was more limited back then, having more limitations on the predicate that can be passed to a seek operator.

    Mind you, this is all speculation.

    So, the follow-up question: can anyone think of or provide any cases/examples where inequalities might perform worse than other logically equivalent queries, for any given limited dataset, in SQL server 2005 and later?

    Yup. Try the two below. The first one uses a seek, the second (that matches the exact same row) a scan.

    WHERE Type = 5

    WHERE (Type * Type + 6) / 5 = 6

    EDIT: Sorry, you asked specifically about inequalities. Probably possible to do similar with those.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (7/7/2009)


    Or columns constrained by a trusteed CHECK constraint.

    Good catch :blush:

    I'm still very interested to see whether anyone can tell me more about the difference between the following three cases though:

    - WHERE (Type > 5 AND Type 5 AND Type < 7)

    - WHERE (Type 7)

    - WHERE (Type 7) AND (Type 4)

    Im not sure if you intended for the first case to repeat the exact same requirement twice.

    Yep, they were not intended to be the same but the result applies regardless; the SQL 2000 query engine appears to support any number of "inner" ranges as predicates to the SEEK operator (I'll use appropriate terminology yet!)

    I don't have SQL2000 installed anymore, so I can't play with it myself. But my guess would be that the third case, though more restrictive, is also more complex. Maybe you hit upon a limitation of the optimizer in SQL 2000. Or maybe the DB engine itself was more limited back then, having more limitations on the predicate that can be passed to a seek operator.

    Mind you, this is all speculation.

    Fair enough, thanks!

    Probably possible to do similar with those.

    Not sure - performing arithmetic operations on or applying functions to the target column seems like "cheating" to me; it doesn't illustrate any problematic aspect to the inequality itself, but rather a general optimizer "limitation"/consideration.

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • Tao Klerks (7/7/2009)


    I'm still very interested to see whether anyone can tell me more about the difference between the following three cases though:

    - WHERE (Type > 1 AND Type 5 AND Type < 7)

    - WHERE (Type 7)

    - WHERE (Type 7) AND (Type 4)

    As far as I can tell the query optimizer consistently performs a straight "seek" for the first and second cases, but not the third, even though the third is actually more restrictive than the second.

    Just to confirm your results, both SQL2K5 and 2K8 generate a plan with seeks, though the 2K8 optimizer uses the potentially more efficient Seek Keys, rather than StartRange and EndRange range expressions. (I used slightly different constants in the WHERE clauses).

    XML plans attached.

    Paul

  • Tao Klerks (7/7/2009)


    So, the follow-up question: can anyone think of or provide any cases/examples where inequalities might perform worse than other logically equivalent queries, for any given limited dataset, in SQL server 2005 and later?

    Well it depends a bit on your definition of 'worse', but try the script below.

    As a side-effect it also illustrates why "logical reads" is so often not the best measure.

    The script was nicely formatted when it left me; this site will have mangled it a bit πŸ™

    Paul

    -- Safety

    USE tempdb;

    SET NOCOUNT ON;

    GO

    -- Test table

    CREATE TABLE dbo.ZZ_DummyTable1

    (

    record_idBIGINT IDENTITY PRIMARY KEY,

    record_typeTINYINT NOT NULL CHECK (record_type BETWEEN 1 AND 32),

    dataNVARCHAR(255),

    )

    GO

    -- Add 250K random-ish records

    WITHNumbers (N)

    AS(

    SELECTTOP (250000)

    ROW_NUMBER() OVER (ORDER BY C1.[object_id])

    FROMmaster.sys.columns C1, master.sys.columns C2, master.sys.columns C3

    )

    INSERTdbo.ZZ_DummyTable1

    (record_type, data)

    SELECTRAND(CHECKSUM(NEWID())) * 32 + 1,

    REPLICATE(NCHAR(65 + N % 27), RAND(CHECKSUM(NEWID())) * 255 + 1)

    FROMNumbers;

    GO

    -- Index

    CREATE NONCLUSTERED INDEX nc1 ON dbo.ZZ_DummyTable1 (record_type);

    GO

    -- Warm up the data cache

    SELECTrecords = COUNT_BIG(*),

    total = SUM(record_id)

    FROMdbo.ZZ_DummyTable1;

    GO

    -- Clear the ad-hoc plan cache

    DBCC FREESYSTEMCACHE('SQL Plans');

    GO

    --

    -- Test queries

    --

    GO

    -- Index scan

    SELECTrecords = COUNT_BIG(*),

    total = SUM(record_id)

    FROMdbo.ZZ_DummyTable1

    WHERErecord_type NOT IN (5, 11, 24, 31);

    GO

    -- Index seek

    SELECTrecords = COUNT_BIG(*),

    total = SUM(record_id)

    FROMdbo.ZZ_DummyTable1

    WHERErecord_type IN (1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 32);

    GO

    -- Index scan

    SELECTrecords = COUNT_BIG(*),

    total = SUM(record_id)

    FROMdbo.ZZ_DummyTable1

    WHERErecord_type 5

    ANDrecord_type 17

    ANDrecord_type 31;

    GO

    -- Index seek

    SELECTrecords = COUNT_BIG(*),

    total = SUM(record_id)

    FROMdbo.ZZ_DummyTable1

    WHERErecord_type BETWEEN 1 AND 4

    ORrecord_type BETWEEN 6 AND 16

    ORrecord_type BETWEEN 18 AND 30

    ORrecord_type = 32;

    GO

    -- Four seeks (possibly with partial aggregates)

    -- (Just for fun)

    WITHSelected

    AS(

    SELECTrecord_id

    FROMdbo.ZZ_DummyTable1

    WHERErecord_type BETWEEN 1 AND 4

    UNIONALL

    SELECTrecord_id

    FROMdbo.ZZ_DummyTable1

    WHERErecord_type BETWEEN 6 AND 16

    UNIONALL

    SELECTrecord_id

    FROMdbo.ZZ_DummyTable1

    WHERErecord_type BETWEEN 18 AND 30

    UNIONALL

    SELECTrecord_id

    FROMdbo.ZZ_DummyTable1

    WHERErecord_type = 32

    )

    SELECTrecords = COUNT_BIG(*),

    total = SUM(record_id)

    FROMSelected;

    GO

    --

    -- Results

    --

    SELECTstatement_text = SUBSTRING(st.[text], qs.statement_start_offset / 2, COALESCE(NULLIF(qs.statement_end_offset, -1), DATALENGTH(st.[text])) / 2),

    qs.execution_count,

    qs.total_worker_time,

    qs.total_logical_reads,

    qs.total_elapsed_time,

    qp.query_plan

    FROMsys.dm_exec_query_stats qs

    CROSS

    APPLYsys.dm_exec_sql_text (qs.[sql_handle]) as st

    CROSS

    APPLYsys.dm_exec_query_plan (qs.plan_handle) as qp

    WHEREst.[text] NOT LIKE N'%dm_exec_query_stats%'

    ANDst.[text] LIKE '%ZZ_DummyTable1%'

    ORDERBY

    qs.creation_time ASC;

    GO

    -- Tidy up

    DROP TABLE dbo.ZZ_DummyTable1

    GO

  • Thanks Paul, you've both authoritatively answered the question and provided me with everything I need for a nice SQL Performance testing harness in 2005 (set up, perform N sets of tests, tear down, compare results with interesting aspects of the query plans extracted by xpaths) - this weekend I'm going to have some fun! πŸ™‚

    (although I imagine this has already been done... maybe I should invest some time in search before diving in)

    Regardless of whether I create something or find something, I'll post back here.

    Thanks again!

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • Piece of cake buddy.



    [font="System"]Bhavesh Patel[/font]

    http://bhaveshgpatel.wordpress.com/

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply