Unsatisfied merge join performance

  • Hi 🙂

    Merge Join 10,000 sorted rows is too slow according to my measurement below:

    CREATE TABLE A

    ( ID INT NOT NULL );

    CREATE TABLE B

    ( ID INT NOT NULL );

    ALTER TABLE A ADD

    CONSTRAINT PK_A PRIMARY KEY CLUSTERED ( ID );

    ALTER TABLE B ADD

    CONSTRAINT PK_B PRIMARY KEY CLUSTERED ( ID );

    DECLARE @TotalRow INT = 10000;

    WHILE @TotalRow > 0

    BEGIN

    INSERT INTO A

    ( ID )

    VALUES

    ( @TotalRow );

    INSERT INTO B

    ( ID )

    VALUES

    ( @TotalRow )

    SET @TotalRow -= 1;

    END;

    The two table is exactly the same, with ID column sorted.

    Then I do a benchmark of join

    SELECT A.ID FROM A

    JOIN B ON A.ID = B.ID

    OPTION ( MAXDOP 1 );

    Query execution plan shows it is a simple one stage merge join.

    The result is disappointing and confusing (to me).

    STATISTICS IO & Time:

    Table 'A'. Scan count 1, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'B'. Scan count 1, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 215 ms.

    But Client Statistics Shows:

    Client Processing Time: 46

    Total Execution Time: 46

    Wait time on server replies: 0

    From a outside point of view, the elapsed time = 215 ms is the "true time".

    I think the algorithm of a merge join is O(n), the ZERO cpu time seems proved this.

    In my own understanding:

    The Total Execution Time: 46ms is spent on logical I/Os

    The elapsed time: 215 ms include both server processing and client processing ( displaying ) cost.

    Am I wrong or right on this interpretation?

    But 38 logical I/Os from memory should not be that costy (46ms).

    How can I further identify the problem or how can I prove this is the theoretical limitation?

  • After discard output,

    the Client Statistics regression to 10 ms.

    I think that's the problem.:-)

  • Yes. The delay caused by returning the output to the client and displaying them will vastly dominate the cost of merge joining 10K rows. Many performance tests of this nature will select the rows into a #temporary table (using SELECT...INTO syntax) to mitigate this effect. Also:

    a) 10K rows is rather too few to be useful

    b) Running the test many times, discarding the top and bottom 'n' results and taking an average is more reliable

    c) Using the sys.dm_exec_query_stats DMV and looking at worker time is a generally better measure in any case

    Paul

  • Thank you for your great answer!:-P

  • Sheng (8/7/2009)


    Thank you for your great answer!:-P

    You are welcome. Your first post was a good one, so I thought you deserved more than a one-line reply. 🙂

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply