SQL Bug? Or do I not understand unions

  • This is tough to explain simply, but either we just do not understand SQL, or there's a bug there.

    Three tables, one is customer and two are almost identical sales tables joined over the customer primary key (UnitCd, CustCd) in two databases. 

    Someone brought a poorly written query that gave the wrong results.  It's got some redundant code, but while perhaps poorly written it does not look incorrect (below).

    Basically the two unioned inner joins combine sales data for active customers, and then the outer one sums it and reports it by customer.  I realize there is no need to join all three on customer, but the bug does not show up unless you do.

    Here is the problem -- see the part in red?  That selects one individual customer from the (roughly 100) result lines.  With that present in the where (actually in either of the three where clauses) the result is correct.

    With that absent, where all customers are reported, the result for that customer is null.  The result for maybe a quarter of the customers is null, incorrectly.  Others are reported correctly.  Change to specifically select any customer and their results are right, report them all and some (but not all) are wrong (wrong by being null).  So far we have not seen any wrong by being the wrong number.

    Remove the two inner joins from the unions (which are redundant) and it works fine.  limit it to one customer on ANY of the where clauses and it works fine.  But as written (less the red) it returns incorrect information.

    What in the word are we missing.

    By the way - the tables are simple, no calculated fields, no UDF's involved, no views, no permission problems (I ran it myself with admin).  It fails on two separate servers with identical database copies.  DB passes CheckDB fine.  W2K3 and SQL2K SP3 with one quad system and one 8-way system.   These specific tables are WIDELY used and give no other indictation of problems, just this one query.

    Time to call Microsoft?  Or am I missing something?

    PS. I can provide the table metadata, but there are a lot of fields and it is pretty long so am omitting it for brevity.  If someone wants I can. 

    select c.CustCd, c.CustNm, sum(s.SoldQty) as SoldQty

    from customer c

    left join

    (

                    select s.UnitCd, s.CustCd, s.SoldQty

                    from sbt.dbo.sbt s

                    inner join dw.dbo.customer c on c.UnitCd=s.UnitCd and c.CustCd=s.CustCd and c.OnHoldFlg='N' and c.InActiveFlg='N'

                    where s.UnitCd='LUN' and s.NatChainCd=271 and s.ISAControlNbr like '00000%'

            union all

                    select ig.UnitCd, ig.CustCd, ig.SoldQty

                    from sbt.dbo.InGeneralsbt ig

                    inner join dw.dbo.customer c on c.UnitCd=ig.UnitCd and c.CustCd=ig.CustCd and c.OnHoldFlg='N' and c.InActiveFlg='N'

                    where ig.UnitCd='LUN' and ig.NatChainCd=271 and ig.ISAControlNbr like '00000%'

    ) as s on s.UnitCd=c.UnitCd and s.CustCd=c.CustCd

    where c.UnitCd='LUN' and c.NatChainCd=271 and c.OnHoldFlg='N' and c.InActiveFlg='N' and c.CustCd=7746

    group by c.CustCd, c.CustNm

  • In the outer select you refer to the 'customer' table, but in the inner selects you refer to 'dw.dbo.customer'.

    Are you sure that these are the same table?

  • I do not know exactly what is wrong, but I would suggest to verify :

    1. for the inner joins, try to use an other alias, as alias c is also used for the outer select. Maybe the parser gets confused ?

    2. read the query plan.

    3. Verify the output not using the group by. So you can see the results that would be summed. ( with and without the where clause )

    4. is customer the same as dbo.customer ? is customer the same as dw.dbo.customer ?

    5. for the left outer join. what happens if you make the where clause part of the "on" clause. THis can make a difference for the join ( BOL : This is because the predicates in the ON clause are applied to the table prior to the join, while the WHERE clause is semantically applied on the result of the join. )

    Bert

  • Argg... this is very frustrating.  It is now not consistent with yesterday.  The specific customer that I was using yesterday now gives the right answer both ways, but others are still wrong.

    It is worth mentioning that the InGeneralSBT table gets fresh data each day, and depending on other conditions (not in these tables) rows move from there to SBT.  So there's both more data and slightly different data today.  Apparently the results are data specific.

    As to the questions (the first posting is covered in this one also).

    1 - different aliases - no change

    2 - query plan - see below

    3- group by removed - I had done that, sorry I did not post.  With it removed the answers are still wrong, the bad customers show up as a single line with null quantity.  When those specific customers are in the where, they show instead as many lines with no null quantities.  Incidentally, none of the indicated fields here are actually null, the only nulls involved are from the left join.

    4- Same customer table - yes, and I changed it to fully qualified and it still failed the same way.

    5 - left outer use on not where - Now that radically changes things, because there are some 65k rows in customer. BUt I ran it, and indeed it then worked correctly.  Very strange.  I tried putting the where clause BOTH added to the ON statement and also left in as a WHERE, and got the right answer.

    Query Plans - I am not good at reading these things.  They are indeed very different.  I'm going to post both text versions in case someone better at it is willing to look.  The first is the customer specific version, the second is identical but removing the specific customer on the outer where.  I guess pointing in some fashion to the outer select's processing.

     

    -- with specific customer

      |--Compute Scalar(DEFINE: ([Expr1008]=If ([Expr1014]=0) then NULL else [Expr1015]))

           |--Stream Aggregate(DEFINE: ([Expr1014]=COUNT_BIG([Union1007]), [Expr1015]=SUM([Union1007]), [c].[CustCd]=ANY([c].[CustCd]), [c].[CustNm]=ANY([c].[CustNm])))

                |--Nested Loops(Left Outer Join)

                     |--Filter(WHERE: (([c].[NatChainCd]=271 AND [c].[InactiveFlg]='N') AND [c].[OnHoldFlg]='N'))

                     |    |--Bookmark Lookup(BOOKMARK: ([Bmk1000]), OBJECT: ([dw].[dbo].[Customer] AS [c]))

                     |         |--Index Seek(OBJECT: ([dw].[dbo].[Customer].[CustCdIndex] AS [c]), SEEK: ([c].[UnitCd]='LUN' AND [c].[CustCd]=11586) ORDERED FORWARD)

                     |--Concatenation

                          |--Nested Loops(Inner Join)

                          |    |--Filter(WHERE: ([c1].[InactiveFlg]='N' AND [c1].[OnHoldFlg]='N'))

                          |    |    |--Bookmark Lookup(BOOKMARK: ([Bmk1002]), OBJECT: ([dw].[dbo].[Customer] AS [c1]))

                          |    |         |--Index Seek(OBJECT: ([dw].[dbo].[Customer].[CustCdIndex] AS [c1]), SEEK: ([c1].[UnitCd]='LUN' AND [c1].[CustCd]=11586) ORDERED FORWARD)

                          |    |--Parallelism(Gather Streams)

                          |         |--Filter(WHERE: (like(.[ISAControlNbr], '00000%', NULL)))

                          |              |--Bookmark Lookup(BOOKMARK: ([Bmk1001]), OBJECT: ([SBT].[dbo].[SBT] AS ))

                          |                   |--Hash Match(Inner Join, HASH: ([KeyCo22])=([KeyCo22]), RESIDUAL: ([KeyCo22]=[KeyCo22]))

                          |                        |--Bitmap(HASH: ([KeyCo22]), DEFINE: ([Bitmap1013]))

                          |                        |    |--Parallelism(Repartition Streams, PARTITION COLUMNS: ([KeyCo22]))

                          |                        |         |--Index Seek(OBJECT: ([SBT].[dbo].[SBT].[CustCdUnitCdProcessedFlgIndex] AS ), SEEK: (.[UnitCd]='LUN' AND .[CustCd]=11586) ORDERED FORWARD)

                          |                        |--Parallelism(Repartition Streams, PARTITION COLUMNS: ([KeyCo22]), WHERE: (PROBE([Bitmap1013])=TRUE))

                          |                             |--Index Seek(OBJECT: ([SBT].[dbo].[SBT].[NatChainCdIndex] AS ), SEEK: (.[NatChainCd]=271) ORDERED FORWARD)

                          |--Nested Loops(Inner Join)

                               |--Filter(WHERE: ([c2].[InactiveFlg]='N' AND [c2].[OnHoldFlg]='N'))

                               |    |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([dw].[dbo].[Customer] AS [c2]))

                               |         |--Index Seek(OBJECT: ([dw].[dbo].[Customer].[CustCdIndex] AS [c2]), SEEK: ([c2].[UnitCd]='LUN' AND [c2].[CustCd]=11586) ORDERED FORWARD)

                               |--Parallelism(Gather Streams)

                                    |--Filter(WHERE: ([ig].[NatChainCd]=271 AND like([ig].[ISAControlNbr], '00000%', NULL)))

                                         |--Bookmark Lookup(BOOKMARK: ([Bmk1003]), OBJECT: ([SBT].[dbo].[InGeneralSBT] AS [ig]))

                                              |--Index Seek(OBJECT: ([SBT].[dbo].[InGeneralSBT].[UnitCdCustCdIndex] AS [ig]), SEEK: ([ig].[UnitCd]='LUN' AND [ig].[CustCd]=11586) ORDERED FORWARD)

    -----------------------------------------------------------------

     

    -- without specific customer

      |--Compute Scalar(DEFINE: ([Expr1008]=If ([Expr1013]=0) then NULL else [Expr1014]))

           |--Stream Aggregate(GROUP BY: ([c].[CustCd], [c].[CustNm]) DEFINE: ([Expr1013]=COUNT_BIG([Union1007]), [Expr1014]=SUM([Union1007])))

                |--Nested Loops(Left Outer Join, OUTER REFERENCES: ([c].[CustCd]))

                     |--Filter(WHERE: ([c].[InactiveFlg]='N' AND [c].[OnHoldFlg]='N'))

                     |    |--Bookmark Lookup(BOOKMARK: ([Bmk1000]), OBJECT: ([dw].[dbo].[Customer] AS [c]))

                     |         |--Index Seek(OBJECT: ([dw].[dbo].[Customer].[ChainCdPiecesIndex] AS [c]), SEEK: ([c].[UnitCd]='LUN' AND [c].[NatChainCd]=271) ORDERED FORWARD)

                     |--Concatenation

                          |--Nested Loops(Inner Join)

                          |    |--Filter(WHERE: ([c1].[InactiveFlg]='N' AND [c1].[OnHoldFlg]='N'))

                          |    |    |--Bookmark Lookup(BOOKMARK: ([Bmk1002]), OBJECT: ([dw].[dbo].[Customer] AS [c1]))

                          |    |         |--Index Seek(OBJECT: ([dw].[dbo].[Customer].[CustCdIndex] AS [c1]), SEEK: ([c1].[UnitCd]='LUN' AND [c1].[CustCd]=[c].[CustCd]) ORDERED FORWARD)

                          |    |--Table Spool

                          |         |--Filter(WHERE: (like(.[ISAControlNbr], '00000%', NULL)))

                          |              |--Bookmark Lookup(BOOKMARK: ([Bmk1001]), OBJECT: ([SBT].[dbo].[SBT] AS ))

                          |                   |--Merge Join(Inner Join, MERGE: ([KeyCo22])=([KeyCo22]), RESIDUAL: ([KeyCo22]=[KeyCo22]))

                          |                        |--Sort(ORDER BY: ([KeyCo22] ASC))

                          |                        |    |--Index Seek(OBJECT: ([SBT].[dbo].[SBT].[CustCdUnitCdProcessedFlgIndex] AS ), SEEK: (.[UnitCd]='LUN' AND .[CustCd]=[c].[CustCd]) ORDERED FORWARD)

                          |                        |--Index Seek(OBJECT: ([SBT].[dbo].[SBT].[NatChainCdIndex] AS ), SEEK: (.[NatChainCd]=271) ORDERED FORWARD)

                          |--Nested Loops(Inner Join)

                               |--Filter(WHERE: ([c2].[InactiveFlg]='N' AND [c2].[OnHoldFlg]='N'))

                               |    |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([dw].[dbo].[Customer] AS [c2]))

                               |         |--Index Seek(OBJECT: ([dw].[dbo].[Customer].[CustCdIndex] AS [c2]), SEEK: ([c2].[UnitCd]='LUN' AND [c2].[CustCd]=[c].[CustCd]) ORDERED FORWARD)

                               |--Table Spool

                                    |--Filter(WHERE: ([ig].[NatChainCd]=271 AND like([ig].[ISAControlNbr], '00000%', NULL)))

                                         |--Bookmark Lookup(BOOKMARK: ([Bmk1003]), OBJECT: ([SBT].[dbo].[InGeneralSBT] AS [ig]))

                                              |--Index Seek(OBJECT: ([SBT].[dbo].[InGeneralSBT].[UnitCdCustCdIndex] AS [ig]), SEEK: ([ig].[UnitCd]='LUN' AND [ig].[CustCd]=[c].[CustCd]) ORDERED FORWARD)

  • That gives the correct answer, as does saving the union in a temp table first.

     

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply