nested query - bug?

  • Find orderdetails for categoryid=1. (categoryid is in the product table)

    The following works successfully

    select od.*

    from sales.orderdetails od

    where productid in (select productid from production.products where categoryid=1)

    However, let's say I make a mistake without noticing. The only word changed is in capitals

    select od.*

    from sales.orderdetails od

    where productid in (select ORDERID from production.products where categoryid=1)

    OREDERID comes from the outer table. (Execution plan says 'nested loops- left semi join')

    If I run the entire query it returns no rows (obviously), BUT if I run (select ORDERID from production.products where categoryid=1) ON IT'S OWN - it errors with invalid column name!!!!!

    So, how come, on it's own it errors but run as a subselect in the whole query, it doesn't?????

    Thanks

  • I ran the below on adventureworks2012 running on a sqlserver2012 instance version 11.03156 have reproduced the issue.

    Had to change the code to get it to run on adventureworks.

    The subquery throws an error. --Edited(I changed the column in the subquery to salesorderdetailID as per adventureworks and runs without issue)

    select od.*

    from sales.salesorderdetail od

    where productid in (select productid from production.product where [ProductSubcategoryID] =1)

    go

    select od.*

    from sales.salesorderdetail od

    where productid in (select SalesOrderID from production.product where [ProductSubcategoryID] =1)

    GO

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • lotusnotes (4/13/2016)


    OREDERID comes from the outer table. (Execution plan says 'nested loops- left semi join')

    You answered the question yourself. If the column isn't found in the inner query, the outer query is searched. This is a good lesson on why you should use aliases - it avoids confusion.

    John

  • This is not a bug. It's a feature.

    Allowing to reference columns from the outer query allows you to create correlated subqueries. This is specially important when using EXISTS or APPLY.

    Due to this feature, it's important that you always qualify your columns (using 2 part names) to know where they come from.

    Reference: https://technet.microsoft.com/en-us/library/ms178050(v=sql.105).aspx

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Ok. Thanks guys.

    If you use an alias od.orderid it still lets you do it & the overall code still works & returns no rows.

    You would hope to spot that is the wrong alias in the inner query.

    What is really scary is if you change the code to 'where productid NOT in (select ordered.....'

    the whole query returns ALL the rows from the outer table. i.e if you were using the resulting rowset in a merge statement, you would be in big trouble.

  • lotusnotes (4/14/2016)


    Ok. Thanks guys.

    If you use an alias od.orderid it still lets you do it & the overall code still works & returns no rows.

    You would hope to spot that is the wrong alias in the inner query.

    What is really scary is if you change the code to 'where productid NOT in (select ordered.....'

    the whole query returns ALL the rows from the outer table. i.e if you were using the resulting rowset in a merge statement, you would be in big trouble.

    When you use

    select od.*

    from sales.orderdetails od

    where productid in (select ORDERID from production.products where categoryid=1)

    You're telling SQL Server to return from sales.orderdetails where the product is equal to the orderid, but only if there are rows in the table products where categoryid = 1.

    Here's a quick example to try to make it clear.

    CREATE TABLE Product(

    ProductId int,

    ProductSubcategoryID int

    );

    CREATE TABLE SalesOrderDetail(

    SalesOrderID int,

    ProductId int

    );

    INSERT INTO Product

    VALUES

    (1,1),

    (2,1),

    (3,2),

    (4,2),

    (5,2),

    (6,3);

    INSERT INTO SalesOrderDetail

    VALUES

    (1,1), --Will show up because ProductId = SalesOrderID

    (1,2), --Won't show up because ProductId <> SalesOrderID

    (2,1), --Won't show up because ProductId <> SalesOrderID

    (2,4), --Won't show up because ProductId <> SalesOrderID

    (2,3), --Won't show up because ProductId <> SalesOrderID

    (3,3); --Will show up because ProductId = SalesOrderID

    --Returns 2 rows because we have rows in Product where ProductSubcategoryID = 1

    SELECT od.*

    FROM SalesOrderDetail od

    WHERE ProductId in (SELECT SalesOrderID FROM Product WHERE ProductSubcategoryID = 1);

    --Returns 0 rows because we DON'T have rows in Product where ProductSubcategoryID = 6

    SELECT od.*

    FROM SalesOrderDetail od

    WHERE ProductId in (SELECT SalesOrderID FROM Product WHERE ProductSubcategoryID = 6);

    GO

    DROP TABLE Product;

    DROP TABLE SalesOrderDetail;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • lotusnotes (4/14/2016)


    Ok. Thanks guys.

    If you use an alias od.orderid it still lets you do it & the overall code still works & returns no rows.

    You would hope to spot that is the wrong alias in the inner query.

    What is really scary is if you change the code to 'where productid NOT in (select ordered.....'

    the whole query returns ALL the rows from the outer table. i.e if you were using the resulting rowset in a merge statement, you would be in big trouble.

    There is very often a need to reference data from the outer tables in a subquery, so if you explicitly type od.orderid you apparently explicitly want to reference that column. SQL Server does not judge your queries, it simply executes them.

    If you had wanted to use a column from the subquery itself, you would have typed pr.orderid and then you would have gotten the expected error.

    In queries that have more than a single table reference, b est practise is to *ALWAYS* table-qualify *EVERY* column reference throughout the entire query. Either with the table name or with the table alias (for readability, I prefer the latter).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply