Can we make this query run faster?

  • Hello All,

    Consider the following table:

    create table MessageTbl

    (

       msgId int primary key clustered,

       src nvarchar(50),      -- Source of the message

       type int,                  -- Type of the message.

       msg  nvarchar(1000), -- Text of the message

       dt   datetime            -- When the message was posted

    )

    With indexes:

    CREATE INDEX IX_MessageTbl_1

      ON MessageTbl (src, type, dt DESC)

    CREATE INDEX IX_MessageTbl_2

      ON MessageTbl (dt DESC)

    In the following queries @myTable, @startime , and @endTine are defined as:

    declare @myTable table( src nvarchar(50), type int)

    declare @startTime datetime

    declare @endTime datetime

     

    Consider the following two queries:

    ------------- #1

    SELECT     a.*

    FROM       MessageTbl AS a

    INNER JOIN @myTable AS b

          ON   b.src = a.src

          AND  b.type = a.type

    WHERE      a.dt BETWEEN @startTime AND @endTime

    AND       (SELECT COUNT(*)

               FROM   MessageTbl AS c

               WHERE  c.src = a.src

               AND    c.type = a.type

               AND    c.dt BETWEEN @startTime AND @endTime

               AND    c.dt <= a.dt) <= 20

    ------------- #2

    SELECT     a.*

    FROM       MessageTbl AS a

    INNER JOIN @myTable AS b

          ON   b.src = a.src

          AND  b.type = a.type

    WHERE      a.dt BETWEEN @startTime AND @endTime

    AND        a.dt IN (SELECT   TOP 20 dt

                        FROM     MessageTbl AS c

                        WHERE    c.src = a.src

                        AND      c.type = a.type

                        AND      c.dt BETWEEN @startTime AND @endTime

                        ORDER BY c.dt DESC)

     

    Query #2 runs 6 times faster than #1 under the same conditions. Nevertheles, #2 has a big disadvantage to me because I need to vary the number of elements returned by the TOP clause. I could create the whole query string dynamically but I would consider that only as a last resource.

    Frankly, I do not understand why #1 is slower. Both queries seem to favor from the defined indexes and have apparently the same complexity. The only difference is in one of the subqueries in the WHERE clause. While #1 has

    AND       (SELECT COUNT(*)

               FROM   MessageTbl AS c

               WHERE  c.src = a.src

               AND    c.type = a.type

               AND    c.dt BETWEEN @startTime AND @endTime

               AND    c.dt <= a.dt) <= 20

    query #2 has

    AND        a.dt IN (SELECT   TOP 20 dt

                        FROM     MessageTbl AS c

                        WHERE    c.src = a.src

                        AND      c.type = a.type

                        AND      c.dt BETWEEN @startTime AND @endTime

                        ORDER BY c.dt DESC)

     

    I would appreciate any insight behind the differences in performance as well as any suggestions (new indices, hints, etc) on how to improve the performance of #1.

     

    Thank you

    - CD

     

     

  • It would be helpful if you provided a description of what you are trying to achieve.

    On first glance I'd suggest the following,

    1) Move the sub-query from the WHERE clause to the FROM clause

    2) Change the primary key to non-clustered. And change your index on the date column to the clustered one.

     

    --------------------
    Colt 45 - the original point and click interface

  • did you try to create a CLUSTERED index on dt, src, type? In this order. Because you are doing a range seeking (between) and a "LESS THEN" comparison so a clustered index on the dt column at the first position would be more effective



    Bye
    Gabor

  • Both of your subqueries are correlated, which is often a problem, but with the 1st query, SQL probably has to do a lot more evaluation and reading rows to determine counts, and then perform the <= 20 comparison.

    I'm trying to work out what you are trying to do, but it's a bit late and I'm distracted by the tellie   Could you elaborate what you are trying to accomplish...

    I can only assume that you are trying to get messages from a set of sources + types between two dates - furthermore, you only want the top X records..  I can think of a few ways, but before trying other things, you should try adding in the extra indices as described above (clustered index on date, type and source in that order) and also make your query look like this

    SELECT     a.*
    FROM       MessageTbl AS a 
    INNER JOIN @myTable AS b 
          ON   b.src = a.src 
          AND  b.type = a.type 
    WHERE      a.dt BETWEEN @startTime AND @endTime 
    AND       (SELECT COUNT(*) 
               FROM   MessageTbl AS c 
               WHERE  c.src = a.src 
               AND    c.src = b.src
               AND    c.type = a.type 
               AND    c.type = b.type
               AND    c.dt BETWEEN @startTime AND @endTime 
               AND    a.dt >= @startTime
               AND    c.dt <= a.dt) <= 20 

    This might give SQL Server a better way of matching up your tables (since A & B have the same values for src & type columns and a's dt field will always be >= @startTime.  Also, you could try making @myTable a temp table, rather than a table variable, and add indices to it. 

    If you let us know your requirements, we can probably come up with something better.  Cheers!

  • Ian,

    I appologize for not giving a better picture of what I want to accomplish. This is a brief description.

    The MessageTbl table accumulates messages of different types (type column) from different sources (src column). A specific source can have many types associated. The dt column is the time the message was recorded.

    What I want is equivalent to the union of the top N messages in a time frame for each row (src, type) in the @myTable variable. Our current application performs all the queries very inneficiently from the client side. I am attemting to optimize the performance by creating a SP that speeds up the process.

     

    Regards

    - CD

     

     

  • No probs CD

    How did the other suggestions impact on your query execution speed?  If they didn't help, I'll see if I can think of some way of using grouping etc to make it efficient...

    Cheers

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply