Selecting non assigned items

  • Thanks for doing the leg work, as usual 🙂

    If they're all the same on the big tables and the OUTER JOIN is the slowest on smaller tables, why would you ever use the OUTER JOIN method to find the exceptions?

    It's good to know what works best. I tend to only use except when I need to quickly identify what is missing or nonexistent from tables; It saves me on typing :D. In most cases, I use a mixture of outer join and not in, but after looking at the results I am asking myself. Why should I spend extra time typing when I can do it in less lines and the query will perform better. I think I will stick with not in and not exists from now on.

    Thanks again.

  • I think I will stick with not in and not exists from now on.

    Heh... Except on Tuesdays... that's when the "SQL Run Fast" button is automatically recycled to overcome all the work it had to do from cursors and the like... 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hmmm.... I'm not so sure that the LEFT JOIN is so bad...

    I'd say it's the better one of the four by a longshot.

    This is the 'small table' examples from Adventureworks, and it's also with a hot cache.

    The three first gets identical plans, the left join has a different plan.

    A thing I noted was that the CPU times reported are flaky at best. I got between 16ms and 47ms for the left join query just by repeatedly executing it.

    What's more interesting is the I/O load that these queries generate.

    Have a look a these, and say again why one would sometimes use a left join... 😉

    (btw the first three suffer even more with a cold cache)

    -- WHERE NOT IN

    Table 'WorkOrder'. Scan count 504, logical reads 1088, physical reads 0...

    Table 'Product'. Scan count 1, logical reads 4, physical reads 0...

    -- WHERE NOT EXISTS

    Table 'WorkOrder'. Scan count 504, logical reads 1088, physical reads 0...

    Table 'Product'. Scan count 1, logical reads 4, physical reads 0...

    -- EXCEPT

    Table 'WorkOrder'. Scan count 504, logical reads 1088, physical reads 0...

    Table 'Product'. Scan count 1, logical reads 4, physical reads 0...

    -- LEFT JOIN

    Table 'WorkOrder'. Scan count 1, logical reads 101, physical reads 0...

    Table 'Product'. Scan count 1, logical reads 15, physical reads 0...

    Just my opinion, but given these numbers, I'd take the left join as the 'best in this case'.

    /Kenneth

  • Sure, I get the same on the small test... times for the outer join vary between 16 ms and 67 ms with the norm being 47 ms... but look at the CPU times for the other 3 methods... they're all zero.

    Guess it "depends" on what you value the most... speed, IO, or memory ("logical reads"). For me, using memory for almost 0 seconds is pretty much worth the speed advantage.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Though cpu time is really of no concern - at least not in this case.

    It's a volatile measure at best (as we've seen, since it fluctuates > 100% between times)

    The optimizer couldn't care less about it, all that it knows and counts are I/O.

    (btw, I managed to get all 4 at 15-16ms cpu)

    Thus, what makes a query expensive or cheap is it's associated I/O cost.

    In that respect, the left join beats the others by a mile 🙂

    /Kenneth

  • Yep... on the small stuff for sure.

    Just a note, though... CPU time makes all the difference in the world to me... most of the 8 hour jobs at work that I've gotten down to 10-30 minutes were very CPU intensive and not so intensive on IO. For me, speed does matter because there's only so much CPU time available in a given 24 hour period.

    Of course, IO matters as well... but usually logical reads are pretty easy because the data is already cached and, once there, is very fast. It's what makes things like the Tally table just beat the dickens out of CTE's that do the same thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply