Is NOT EXISTS fast like EXISTS?

  • I see logically how the EXISTS clause can be very fast - scan million-row table, if you get a hit on row #100, return TRUE and ignore rows 101 thru 1,000,000.

    Is NOT EXISTS speedy compared to a left join where SomeField is null?

  • NOT EXISTS can be fast, but depends on your indexing. If the NOT EXISTS subquery can exploit an index then yes, it will be quick. Display the Estimated Execution Plan within QA to determine if an index is being used.


    Cheers,
    - Mark

  • Best you're going to get is an INDEX SCAN. So, even with an index, it still has to go through the entire index and look for existence. I suppose theoretically, a non-clustered index would speed things up for NOT EXISTS queries if the value searched is in the index, since SQL Server can load more actual records per extent than the clustered index (assuming a wide data page, of course).

  • Sorry, but I can't agree. Both of the following first 2 examples use an INDEX SEEK to resolve the NOT EXISTS. The 3rd example can use the data within the index but, because a function is applied to criterion, cannot fully exploit the index - therefore does an INDEX SCAN.

    
    

    use pubs

    select * from authors a
    where NOT EXISTS (select * from titleauthor ta where ta.au_id = a.au_id)

    select * from titles t
    where NOT EXISTS (select * from titleauthor ta where ta.title_id = t.title_id)


    select * from titles t
    where NOT EXISTS (select * from titleauthor ta where upper(ta.title_id) = upper(t.title_id))


    Cheers,
    - Mark

  • mccork, I was unable to get the queries you posted to return an index seek in any of my execution plans, however, I was running against a fully-populated database with lots of records, not pubs (unfortunately, my DBA dropped pubs from my DEV box...). Would you mind posting the results of your showplan. I'm interested because an INDEX SEEK seems illogical in this case (unless of course, you're getting INDEX SEEK with subsequent BOOKMARK LOOKUP, which would bump the IO on the query significantly.) Thanks, jay.

  • Your DBA dropped PUBS on your DEV box? Bit over-zealous isn't he/she? I can understand and agree in production, but pubs and northwind can be very handy in dev/test environments.

    Hope these results post ok...

     
    
    select * from authors a
    where NOT EXISTS (select * from titleauthor ta where ta.au_id = a.au_id)

    (3 row(s) affected)

    StmtText
    -----------------------------------------------------------------------------------------------------------------------------
    |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([a].[au_id]))
    |--Compute Scalar(DEFINE:([a].[mvgtest]=substring([a].[au_lname], 1, 10)))
    | |--Clustered Index Scan(OBJECT:([pubs].[dbo].[authors].[UPKCL_auidind] AS [a]))
    |--Index Seek(OBJECT:([pubs].[dbo].[titleauthor].[auidind] AS [ta]), SEEK:([ta].[au_id]=[a].[au_id]) ORDERED FORWARD)

    (4 row(s) affected)

    StmtText
    -----------------------------------------------------------------------------------------------------------

    select * from titles t
    where NOT EXISTS (select * from titleauthor ta where ta.title_id = t.title_id)

    (1 row(s) affected)

    StmtText
    --------------------------------------------------------------------------------------------------------------------------------------
    |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([t].[title_id]))
    |--Clustered Index Scan(OBJECT:([pubs].[dbo].[titles].[UPKCL_titleidind] AS [t]))
    |--Index Seek(OBJECT:([pubs].[dbo].[titleauthor].[titleidind] AS [ta]), SEEK:([ta].[title_id]=[t].[title_id]) ORDERED FORWARD)

    (3 row(s) affected)

    StmtText
    ------------------------------------------------------------------------------------------------------------------------------

    select * from titles t
    where NOT EXISTS (select * from titleauthor ta where upper(ta.title_id) = upper(t.title_id))

    (1 row(s) affected)

    StmtText
    ---------------------------------------------------------------------------------------------
    |--Nested Loops(Left Anti Semi Join, WHERE:(upper([ta].[title_id])=upper([t].[title_id])))
    |--Clustered Index Scan(OBJECT:([pubs].[dbo].[titles].[UPKCL_titleidind] AS [t]))
    |--Index Scan(OBJECT:([pubs].[dbo].[titleauthor].[titleidind] AS [ta]))

    (3 row(s) affected)


    Cheers,
    - Mark

  • Thanks for posting those results (I agree about the DBA, BTW I think he removed them because a certain developer said he'd never use them...). The results point to the bookmark lookups I referred to earlier (NESTED LOOP is top most element, which means that for each row in authors, SQL will do a corresponding lookup in titleauthors to find a match. Of course, it will use the available index to do the existence check (auidint INDEX), but it's still going to be doing a lot of IO. How bout testing the difference between the IO stats on EXISTS vs. NOT EXISTS? Am I asking for too much, already?

    Cheers,

    Jay

  • For the EXISTS, a different plan is generated for the first query, but IO is much the same. For the other 2 queries, IO is identical.

     
    
    NOT EXISTS
    ==========

    Table 'titleauthor'. Scan count 23, logical reads 23, physical reads 0, read-ahead reads 0.
    Table 'authors'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    Table 'titleauthor'. Scan count 18, logical reads 18, physical reads 0, read-ahead reads 0.
    Table 'titles'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    Table 'titleauthor'. Scan count 18, logical reads 18, physical reads 0, read-ahead reads 0.
    Table 'titles'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    EXISTS
    ======

    Table 'authors'. Scan count 19, logical reads 38, physical reads 0, read-ahead reads 0.
    Table 'titleauthor'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

    Table 'titleauthor'. Scan count 18, logical reads 18, physical reads 0, read-ahead reads 0.
    Table 'titles'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    Table 'titleauthor'. Scan count 18, logical reads 18, physical reads 0, read-ahead reads 0.
    Table 'titles'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.


    Cheers,
    - Mark

  • Looks to me as if both EXISTS and NOT EXISTS are doing a lot of IO. I would bet that comparing the IO plans against a LEFT JOIN would show a single scan on titleauthors instead of 18-23 scans, try these with stats on:

    
    
    -- NOT EXISTS
    select * from titles t
    where NOT EXISTS (select * from titleauthor ta where ta.title_id = t.title_id)
    -- LEFT JOIN
    select * from titles t
    left join titleauthors ta
    on t.title_id = ta.title_id
    where ta.title_id IS NULL

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply