I Can't Understand Why This is Slow

Question

I Can't Understand Why This is Slow

souLTower

SSCarpal Tunnel

Points: 4026
More actions
July 23, 2010 at 7:23 am

#74304

I'm replacing a function in application code. The function does this (pseudo code)
Select list of known items from big table tableA using joins (500K rows) -- returns 100 rows
for each
Select list of items from big table using joins tableB based on first results criteria -- may return 100 rows
if no data is returned record it
end
So I wrote a procedure:
select data from
(
select data from bunch of joined tables tableA
) AS A LEFT JOIN
(
select data from bunch of joined tables tableB
) AS B on A.criteria = B.criteria
WHERE B.key IS NULL
So I test the first query. 100 records in 40 ms. Nice
I test the second query. 100 records in 23 ms. Nice
Put them together.... 2.8 seconds. What???
I can't for the life of me understand how this can be happening. The initial tables have nice indexes, that's why the queries are so fast. There's a very small resulting dataset. SQL should breeze through that. I know those are assumptions.
I've tried writing a single query
select stuff from joins
left join the other stuff on criteria
where keyValue is null
Same bad times. I'm stumped.
Thoughts?
ST
PS - So as a test I did the following, simulating what I thought the server would do:
select * into #A from first query
select * into #B from second query
select stuff from #A left join #B on criteria where keyValue is null
The while thing returns as expected. The sum of the execution time for the first 2 queries plus a ms to join them. Very confusing. I guess I could write my procedure to do this but why should I have to?
I checked the execution plan for my original attempt and it looks like the methodology above.
Happy Friday

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 1

I don't fully understand your explanation. Please post table definitions, index definitions and execution plan, as per http://qa.sqlservercentral.com/articles/SQLServerCentral/66909/

Are you asking why it's slow, or how to get it fast?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 2

I'd agree that we need more information. Table defs, some sample data, indexes. It's too hard to guess at what could be going on. Don't forget that with larger joins you might end up with more work in tempdb to put that stuff together, and if you hit disk, it will be much, much slower.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 3

OK, I figured it out. Strange indeed.

The purpose of the query is to determine which parts have not been used in the building of a widget.

My first derived table is a listing of parts used in a widget. There are many many parts (200K), including duplicates.

My second derived table is a listing of parts that have been pulled from inventory for the given widget. Again there are many (150K) and there are duplicates.

My thinking was to decrease the number of rows being left joined by placing a distinct on each derived query. This reduced each to 100 rows. Since my output should be distinct I also had a distinct on that. For example:

SELECT DISTINCT X.partNumber FROM

(

SELECT DISTINCT P.partNumber from parts P inner join widgetParts WP on

P.partNumber = WP.partNumber

inner join widget W on WP.widgetID = W.widgetID

WHERE W.identifier = 'foo'

) X LEFT JOIN

(

SELECT DISTINCT P.partNumber from parts P inner join widgetInvoice W on

W.partNumber = P.PartNumber

WHERE W.invoice = 'XYZ'

) Y ON X.partNumber = Y.partNumber

WHERE Y.partNumber is null

My intent was to reduce the size of each side of the join. In looking more closely at the execution plan I see that the server decided to ignore the DISTINCT on each of the inner queries. This caused a left join between a 200K and 150K table.

My solution was to remove the outer distinct however I don't like that I had to do that. I'll have to review my data to make sure that nothing can be duplicated. I find it odd that the server would do this. Did it think I didn't want the distinct in there?