Need help with tsql syntax

  • Would someone please help me with the proper tsql syntax to perform the following:

    I need to list all the account_number values in table1 that have the inactive_flag set to 1 and exist in table2 and/or table3 and/or table4.

    Basically, list the account_number if it exists at least once in any one of the three other tables.

    I only want to list the account_number in table1 once (in account_number order), even though it may be in one or more of the other tables.

    I do NOT want to list the account_number from table1 if it does not exist in all three of the other tables.

    Any help will be appreciated.

    Howard

  • SELECT account_number

            FROM table1

            WHERE inactive_flag = 1

                        AND (

                              EXISTS ( SELECT table2.account_number FROM table2 WHERE table2.account_number = table1.account_number )

                              OR EXISTS ( SELECT table3.account_number FROM table3 WHERE table3.account_number = table1.account_number )

                              OR EXISTS ( SELECT table4.account_number FROM table4 WHERE table4.account_number = table1.account_number )

                       )

    I think that will do the trick. It should at least be close. Another variation would be as follows:

    SELECT account_number

            FROM table1

            WHERE inactive_flag = 1

                        AND EXISTS ( SELECT table2.account_number FROM table2 WHERE table2.account_number = table1.account_number

                              UNION SELECT table3.account_number FROM table3 WHERE table3.account_number = table1.account_number

                              UNION SELECT table4.account_number FROM table4 WHERE table4.account_number = table1.account_number )

  • Both appear to work just fine. Thanks for your help.

    Do you have any idea as to which one would be more efficient when running against large tables?

    Howard

  • something like this should work if your indexes are clean

    select account_number

    from table1

    Where InactiveFlag=1

    AND (

    Account_number IN ( Select Distinct AccountNumber From table2 )

    OR Account_number IN ( Select Distinct AccountNumber From table3 )

    OR Account_number IN ( Select Distinct AccountNumber From table4 )

    )

  • Then again, here is article that might shed some light on speed and efficiency.

    http://archives.postgresql.org/pgsql-performance/2002-12/msg00185.php

    seems that exists is better on larger sub-selects, where as in can be better on smaller sub-selects

  • I'm going to say the former based on the following. If it only exists in table2 SQL Server should know enough in the first method to not even bother looking at the other two EXISTS clauses due to them using an OR constraint. Similary, if it is not in table 2 but is in table3 it wouldn't have to check table4. I'm not going to swear that SQL Server will actually evaluate them in the order you list them, but I would hope that it would know enough not to always evaluate every one. I've never done any analysis (or read anything to my knowledge) to prove that ORed conditions can be ignored by SQL Server once one has been satisfied.

    The second method requires all three subselects to be evaluated before doing the condition check. What's more, they would need to be merged into a single result which although very fast is technically another step.

  • In the statement below from Aaron you should use UNION ALL, not UNION.  A UNION on its own ALWAYS forces a sort to remove duplicates regardless of if duplicates actually exist.  A UNION ALL does not do any sorting. 

    However, Aaron's statement using OR EXISTS is possibly your best solution.  The version with UNION will always force all 3 tables to be scanned, but the OR EXISTS gives SQL a chance to stop scanning once it has found a match.  Even if your current SQL fix level scans all 3 tables with OR EXISTS, a new fix level may avoid it.

    SELECT account_number 

           FROM table1

           WHERE inactive_flag = 1

                        AND EXISTS ( SELECT table2.account_number FROM table2 WHERE table2.account_number = table1.account_number 

                              UNION SELECT table3.account_number FROM table3 WHERE table3.account_number = table1.account_number

                              UNION SELECT table4.account_number FROM table4 WHERE table4.account_number = table1.account_number )

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Well, I wasn't expecting SQL to ignore the extra OR's, but I think it has.  I ran the following statement against a test table I already had laying around with >800,000 records in it:

    SELECT * FROM TEST

    WHERE ID IN (SELECT ID FROM TEST)

    OR ID IN (SELECT ID FROM TEST)

    OR ID IN (SELECT ID FROM TEST)

    The query with the extra "OR"s only took 4 extra seconds.  The execution plan shows records coming from TEST and the first subquery of test.  The second and third subquery show zero records contributing to the query.

  • I couldn't help but do some more research on this. Using 4 different tables I put together various queries. SQL Server is actually more aggressive on its optimization than I guessed. In all cases (EXISTS vs IN and OR vs UNION ALL) SQL Server always quit searching as soon as it found a single row. Not just a result from a single table, but a single row from the first table with any matching rows (1 or more of them). Since the tables I was using were relatively large (100,000 to 500,000 rows) I did see that using IN was slower than using EXISTS. I did not see a performance difference between OR and UNION ALL. Finally, SQL Server uses its own best estimate as to which subselect to perform first when ORing them together and so it is not necessarily the first one you list. However, when doing a UNION ALL it will check the tables in the order you list them. As a result, the UNION ALL is usually less efficient than ORing them together unless you happened to list them in the correct order.

  • If you needed to do more than just check for existence (like actually retrieve data from the related tables), then you could use the follwoing syntax.  I know it is more than what you probably need, but may still be useful.
     
    select
         a.accountNumber
    from table1 a
    left outer join table2 b
         on a.accountNumber = b.accountNumber
    left outer join table3 c
         on a.accountNumber = c.accountNumber
    left outer join table4 d
         on a.accountNumber = d.accountNumber
    where
         coalesce(b.accountNumber, c.accountNumber, d.accountNumber,-1) <> -1

  • you know, I have seen quite a few optimization programs that all prefer to use the COALESCE over any IN, EXIST or UNION queries. I always thought that there must be a reason for that commonality between these programs.

  • Thanks to everyone for the excellent responses to my question.

     

    Howard

  • For some reason the following example does not work. It gives me a higher count than all of the other examples.

    SELECT * FROM TEST

    WHERE ID IN (SELECT ID FROM TEST)

    OR ID IN (SELECT ID FROM TEST)

    OR ID IN (SELECT ID FROM TEST)

    Does anyone know why?

    Howard

     

  • For those that are following this thread in response to this suggestion:

    If you needed to do more than just check for existence (like actually retrieve data from the related tables), then you could use the follwoing syntax.  I know it is more than what you probably need, but may still be useful.
    select
         a.accountNumber
    from table1 a
    left outer join table2 b
         on a.accountNumber = b.accountNumber
    left outer join table3 c
         on a.accountNumber = c.accountNumber
    left outer join table4 d
         on a.accountNumber = d.accountNumber
    where
         coalesce(b.accountNumber, c.accountNumber, d.accountNumber,-1) <> -1

    I received the following error:

    (135450 row(s) affected)

    Server: Msg 248, Level 16, State 1, Line 1

    The conversion of the varchar value '010115101001' overflowed an int column. Maximum integer value exceeded.

    The account_number column is char 12 if this helps.

    Howard

  • Since it is a varchar field and not int, you will need to set your fallback value in the coalesce to a varchar value.

     

    coalesce(b.accountNumber, c.accountNumber, d.accountNumber,'No Value') <> 'No Value'

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply