Improving performance on joins to large tables

  • I just reviewed a script posted to this web site in which the author claims will significantly improve perforamnce on joins to large tables:

    http://qa.sqlservercentral.com/scripts/contributions/824.asp

    Basicly the author suggested an alternative approach of :


    FROM (SELECT * FROM Contact WHERE ContactID = @ContactID) c

    LEFT JOIN (SELECT * FROM CONAddress WHERE ContactID = @ContactID)


    instead of the conventional approach of:


    FROM Contact c

    LEFT JOIN CONAddress a

    ON c.ContactID = a.ContactID


    I am very surprised if this is true.  I would have thought that SQL Server would have been able to optimize the conventional approach to be the equivalent or better than the alternate approach.

    Can anyone comment on this?

    Best Regards,

    Jim

  • I have mater and child tables. Mater table has primary key and detail table has index on detail field. Detail table is large.

    select *

     from mdan m

     left join all_rlists r

     on m.id = r.id__

     where m.id = 20085

    Both indexes are used. At first time used pk on master, then index on detail, than nested loops. This is fast.

    select *

     from (select * from mdan where id = 20085) m

     left join (select * from all_rlists where id__ = 20085) d

     on m.id = d.id__

    It has equal plan.

    My comment is what both tables have to be indexed. If there is no indexes separate scan is faster rather left join.

  • For those of you who used the script, try this way and you should see the same efficiency without losing the clear coding:

     

    SELECT c.Name, a.Address, p.Phone

    FROM Contact c

    LEFT JOIN ContactAddress a

    ON c.ContactID = @ContactID

    and a.ContactID = @ContactID

    and c.ContactID = a.ContactID

    LEFT JOIN ContactPhone p

    ON p.ContactID = @ContactID

    and c.ContactID = p.ContactID

     

    This effectively does the same without losing the Clear Join Syntax.

  • I do not have the expierence to say which method is better.  But I do know how to definitively tell the difference using the execution plan in query analyzer.  What I usually do when I have a question about the effects of a change to a tsql statement is to make two copies of it in QA.  When you click the "display execution plan" icon you get back results saying that each query took 50% of the total batch cost.  I then alter one of the queries to see if the execution plan gets better or worse for that particular query.  This method would tell you if the

    Select ?? from ?? inner join ??? on ...

    or

    Select ?? from (select ...) inner join (select ...) on

    is the better method.

     

    But just remember, the answer that you find is probably not an absolute for all statements just for the one you are testing.

    Steve

  • I gave it a try. I used a Solomon Accounting Database, I have some huge (10 gigs) and gave it a try. I used tables with a lot more complicated keys and foreign keys but still filtering the results by one field (reference number). Interesting enough the traditional execution plan was simpler (one less step).

    The traditional was in this format...

    Declare @refnbr char(10)

    set @refnbr = '004640'

    select *

    from ardoc doc

    left join artran trans

    on doc.batnbr = trans.batnbr and doc.refnbr = trans.refnbr and

    doc.doctype = trans.trantype and doc.custid = trans.custid

    where doc.refnbr = @refnbr

    The "new" (I'm sorry but derived tables are not new):

    Declare @refnbr char(10)

    set @refnbr = '004640'

    select *

    from (select * from ardoc where refnbr = @refnbr) doc

    left join (select * from artran where refnbr = @refnbr) trans

    on doc.batnbr = trans.batnbr and doc.refnbr = trans.refnbr and

    doc.doctype = trans.trantype and doc.custid = trans.custid

    Consistently in serveral databases the "traditional way" was faster. It took no seconds to perform in most cases while the derived tables took a second.

    Lab results say, NOT!

    I've had great luck using derived tables for doing some very complicated joins that i otherwise could not do, but to use it for speed improvment, I think not...

  • The author also says that UDF to UDF joins will be faster than straight joins.  NOT!  If this is occuring you've got some schema/index issues.

    cl

    Signature is NULL

  • Jim Underwood,

    The author of this short spiel about Performance on large tables using JOINS needs to state just what exactly constitutes a large Table? AND, how many JOINS? He has opened up a can of worms here without some tangible facts!

    Until we get some figures and test his theory on these figures, then its all conjecture!


    Kindest Regards,

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply