I Can't Understand Why This is Slow

  • I'm replacing a function in application code. The function does this (pseudo code)

    Select list of known items from big table tableA using joins (500K rows) -- returns 100 rows

    for each

    Select list of items from big table using joins tableB based on first results criteria -- may return 100 rows

    if no data is returned record it

    end

    So I wrote a procedure:

    select data from

    (

    select data from bunch of joined tables tableA

    ) AS A LEFT JOIN

    (

    select data from bunch of joined tables tableB

    ) AS B on A.criteria = B.criteria

    WHERE B.key IS NULL

    So I test the first query. 100 records in 40 ms. Nice

    I test the second query. 100 records in 23 ms. Nice

    Put them together.... 2.8 seconds. What???

    I can't for the life of me understand how this can be happening. The initial tables have nice indexes, that's why the queries are so fast. There's a very small resulting dataset. SQL should breeze through that. I know those are assumptions.

    I've tried writing a single query

    select stuff from joins

    left join the other stuff on criteria

    where keyValue is null

    Same bad times. I'm stumped.

    Thoughts?

    ST

    PS - So as a test I did the following, simulating what I thought the server would do:

    select * into #A from first query

    select * into #B from second query

    select stuff from #A left join #B on criteria where keyValue is null

    The while thing returns as expected. The sum of the execution time for the first 2 queries plus a ms to join them. Very confusing. I guess I could write my procedure to do this but why should I have to?

    I checked the execution plan for my original attempt and it looks like the methodology above.

    Happy Friday

  • I don't fully understand your explanation. Please post table definitions, index definitions and execution plan, as per http://qa.sqlservercentral.com/articles/SQLServerCentral/66909/

    Are you asking why it's slow, or how to get it fast?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'd agree that we need more information. Table defs, some sample data, indexes. It's too hard to guess at what could be going on. Don't forget that with larger joins you might end up with more work in tempdb to put that stuff together, and if you hit disk, it will be much, much slower.

  • OK, I figured it out. Strange indeed.

    The purpose of the query is to determine which parts have not been used in the building of a widget.

    My first derived table is a listing of parts used in a widget. There are many many parts (200K), including duplicates.

    My second derived table is a listing of parts that have been pulled from inventory for the given widget. Again there are many (150K) and there are duplicates.

    My thinking was to decrease the number of rows being left joined by placing a distinct on each derived query. This reduced each to 100 rows. Since my output should be distinct I also had a distinct on that. For example:

    SELECT DISTINCT X.partNumber FROM

    (

    SELECT DISTINCT P.partNumber from parts P inner join widgetParts WP on

    P.partNumber = WP.partNumber

    inner join widget W on WP.widgetID = W.widgetID

    WHERE W.identifier = 'foo'

    ) X LEFT JOIN

    (

    SELECT DISTINCT P.partNumber from parts P inner join widgetInvoice W on

    W.partNumber = P.PartNumber

    WHERE W.invoice = 'XYZ'

    ) Y ON X.partNumber = Y.partNumber

    WHERE Y.partNumber is null

    My intent was to reduce the size of each side of the join. In looking more closely at the execution plan I see that the server decided to ignore the DISTINCT on each of the inner queries. This caused a left join between a 200K and 150K table.

    My solution was to remove the outer distinct however I don't like that I had to do that. I'll have to review my data to make sure that nothing can be duplicated. I find it odd that the server would do this. Did it think I didn't want the distinct in there?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply