Selecting non assigned items

  • Hello,

    I have an assigned_properties and a properties table.

    I want to select all property_Ids that are not found in assigned_properties. This sounds so simple but for some reason I can't get my mind around it.

    Examples?

    Thanks!

  • What have you tried?

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Select ID

    From Properties

    Where ID Not IN (Select Assigned.PropertyID From Assigned)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I know this is completely wrong as it selects the assigned records (and sorta duplicates efforts) but just so you can have an example:

    Select * From Properties INNER JOIN (Select P1.* From Properties As P1 INNER JOIN Assigned_Project_Properties

    ON Assigned_Project_Properties.Property_Id = P1.Property_Id) As P2 ON P2.Property_id = Properties.Property_Id

  • Thanks! I'm on the right track now.

  • SELECT *

    FROM Property a

    left JOIN Assigned_properties b

    on a.propertyid = b.propertyid

    where b.property_id is null

    Also, you could use except

    SELECT Property_id

    from Property

    except

    select property_id

    from assigned_properties

  • FYI, I only asked what you've tried to make sure this wasn't homework. It still may be, but you've shown that you've put forth effort.

    So ....

    SELECT

    *

    FROM

    Properties AS p

    LEFT JOIN Assigned_Project_Properties AS ap

    ON Assigned_Project_Properties.Property_Id = P1.Property_Id

    WHERE

    p.Property_Id IS NULL

    Think of it this way, You are combining "Properties" and "Assigned_Project_Properties" data together on the "Property_Id". The records that do not have a cooresponding record in the "Assigned_Project_Properties" or RIGHT table because it's on the RIGHT side of the JOIN, will have a vaule of NULL for each field. So limiting the results to those gets you what you want.

    Take of the WHERE clause, and it may become more apparent.

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Just my 2 cents on this one... I've found that the WHERE NOT IN that Barry used is usually more performant than outer joins. The speed difference isn't huge over a million rows, but it is faster. Compare the execution plans to see why.

    I've not tried the EXCEPT intersection (or rather lack of intersection) yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (3/25/2008)


    Just my 2 cents on this one... I've found that the WHERE NOT IN that Barry used is usually more performant than outer joins. The speed difference isn't huge over a million rows, but it is faster. Compare the execution plans to see why.

    I've not tried the EXCEPT intersection (or rather lack of intersection) yet.

    Really? I'll have to keep that in mind and do some tests. It's not that I don't believe you, I'm just a see for myself kind of guy.

    Now you've got me curious enought to see why this is the case, and to look deeper into the EXCEPT keyword.

    Thanks for the 2 cents Jeff. Keep it up and I will have a whole $3.27 from you. 😉

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Except operates as an outer join. It compares the results of both queries and returns distinct values of the left table. These are the values that exist in the left table, but not in the right. Thus, it would provide the same execution plan as the left outer join.

  • Adam Haines (3/25/2008)


    Except operates as an outer join. It compares the results of both queries and returns distinct values of the left table. These are the values that exist in the left table, but not in the right. Thus, it would provide the same execution plan as the left outer join.

    Thanks for the info, Adam... have you actually done a comparison/performance test on EXCEPT?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jason Selburg (3/25/2008)


    Really? I'll have to keep that in mind and do some tests. It's not that I don't believe you, I'm just a see for myself kind of guy.

    Now you've got me curious enought to see why this is the case, and to look deeper into the EXCEPT keyword.

    I've got some old tests laying around somewhere... I'll see if I can dig them up, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I have done a small scale test (not the million row type you do) and it did in fact use the same query plan. Obviously, the main difference here is the tables are not joined. The results of each query is compared, not just key columns.

    For every result that differs, it is considered distinct and thus returned. To avoid this behavior on keys you should only list values that should be common among the two queries. For example, if a column named update_dt is different but everything else in the row is the same, the row is still considered distinct, therefore, the update_dt column should not be included.

  • After a little research it seems that it is equivlant to a left join, but the actual plan is more like a NOT IN statement.

    See the link below:

    http://blog.sqlauthority.com/2007/05/22/sql-server-2005-comparison-except-operator-vs-not-in/

  • Thanks for the link, Adam... that reminded me where I put my test code... and I added "EXCEPTION" to it...

    First, I just don't trust execution plans, estimated or actual... especially where the % of batch comes in.

    Remember I said that WHERE NOT IN was better than LEFT OUTER JOIN with a null search? Here's the proof... with a caveat that I'll explain in a minute...

    --===== Select the database to use

    USE AdventureWorks

    GO

    --===== Test the WHERE NOT IN method

    PRINT REPLICATE('=',80)

    PRINT '========== Test the WHERE NOT IN method =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT ProductID

    FROM Production.Product

    WHERE ProductID

    NOT IN (SELECT ProductID

    FROM Production.WorkOrder)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    --===== Test the WHERE NOT EXISTS method

    PRINT REPLICATE('=',80)

    PRINT '========== Test the WHERE NOT EXISTS method =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT p.ProductID

    FROM Production.Product p

    WHERE NOT EXISTS (SELECT 1 FROM Production.WorkOrder w WHERE w.ProductID = p.ProductID)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    --===== Test the EXCEPT method

    PRINT REPLICATE('=',80)

    PRINT '========== Test the EXCEPT method =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT ProductID

    FROM Production.Product

    EXCEPT --------------------->>>

    SELECT ProductID

    FROM Production.WorkOrder

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    --===== Test the LEFT OUTER JOIN method

    PRINT REPLICATE('=',80)

    PRINT '========== Test the LEFT OUTER JOIN method =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT p.ProductID

    FROM Production.Product p

    LEFT OUTER JOIN Production.WorkOrder w

    ON p.ProductID = w.ProductID

    WHERE w.ProductID IS NULL

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    ... the caveat is that the fewer number of rows there are, the worse the LEFT OUTER JOIN does... but use a million rows looking for only 50,000 missing rows, and everyone becomes equal...

    --===== Create the test tables

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1)

    INTO #JBMTestBig

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    ALTER TABLE #JBMTestBig

    ADD CONSTRAINT PK_JBMTestBig_RowNum PRIMARY KEY CLUSTERED (RowNum)

    SELECT TOP 950000

    RowNum = IDENTITY(INT,1,1)

    INTO #JBMTestSmall

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    ALTER TABLE #JBMTestSmall

    ADD CONSTRAINT PK_JBMTestSmall_RowNum PRIMARY KEY CLUSTERED (RowNum)

    GO

    --===== Rerun from here down for addition testing in same connection

    --===== Test the WHERE NOT IN method

    PRINT REPLICATE('=',80)

    PRINT '========== Test the WHERE NOT IN method =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT RowNum

    FROM #JBMTestBig

    WHERE RowNum

    NOT IN (SELECT RowNum

    FROM #JBMTestSmall)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    --===== Test the WHERE NOT EXISTS method

    PRINT REPLICATE('=',80)

    PRINT '========== Test the WHERE NOT EXISTS method =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT p.RowNum

    FROM #JBMTestBig p

    WHERE NOT EXISTS (SELECT 1 FROM #JBMTestSmall w WHERE w.RowNum = p.RowNum)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    --===== Test the EXCEPT method

    PRINT REPLICATE('=',80)

    PRINT '========== Test the EXCEPT method =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT RowNum

    FROM #JBMTestBig

    EXCEPT --------------------->>>

    SELECT RowNum

    FROM #JBMTestSmall

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    --===== Test the LEFT OUTER JOIN method

    PRINT REPLICATE('=',80)

    PRINT '========== Test the LEFT OUTER JOIN method =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT p.RowNum

    FROM #JBMTestBig p

    LEFT OUTER JOIN #JBMTestSmall w

    ON p.RowNum = w.RowNum

    WHERE w.RowNum IS NULL

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    If they're all the same on the big tables and the OUTER JOIN is the slowest on smaller tables, why would you ever use the OUTER JOIN method to find the exceptions?

    Also, on the link that Adam was good enough to share, there may be some problems with using TOP with EXCEPTION (dunno why you might want to do that, anyway, but who knows).

    For a single column comparison, WHERE NOT IN is the easiest to write.

    Like I said before, WHERE NOT IN is better than the LEFT OUTER JOIN... if more than 1 column is involved, WHERE NOT EXISTS would be my next choice. I doubt if I'd use EXCEPTION because I don't trust it yet, especially when folks say it has problems in certain areas.

    But, as long as this post is, it's just my opinion. 😉 Use what you feel more comfortable with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply