Help optimizing a query

  • Hello All,

    I have a simple table that stores messages of different types from

    different sources. The definition of the table is shown below. I need

    to devise an efficient query to return a "list of the N more recent

    messages for a subset of sources within a specified time frame."

    create table MessageTbl

    (

       src nvarchar(50),      -- Source of the message

       type nvarchar(50),     -- Type of the message.

       msg  nvarchar(1000),   -- Text of the message

       dt   datetime          -- When the message was posted

    )

    We are given the following parameters:

    1. declare @startTime datetime -- Start of the time frame

    2. declare @endTime datetime   -- End of the time frame

    3. declare @myTable( src nvarchar(50), type nvarchar(50)) -- This table

    contains a list of sources/types for which we want to obtain the

    messages.

    4. N -- How many messages per source/type

    The following seems to produce the correct result but it is very expensive. It takes almost 30 seconds

    when MessageTbl has about 3600 rows and @myTable has 51 rows. I checked

    the execution plan and it seems there is a lot of table scanning going

    on.

    SELECT     a.src, a.type, a.msg, a.dt

    FROM       MessageTbl AS a

    INNER JOIN @myTable AS b

          ON   b.src = a.src

          AND  b.type = a.type

    WHERE      a.dt BETWEEN @startTime AND @endTime

    AND        a.dt IN (SELECT   TOP 20 dt

                        FROM     MessageTbl AS c

                        WHERE    c.src = a.src

                        AND      c.type = a.type

                        AND      c.dt BETWEEN @startTime AND @endTime

                        ORDER BY c.dt DESC)

    I would appreciate your suggetions and comments.

    Thank you

    CD

  • Is the clustered index on the dt column?

  • OK, somebody help me out here...

    I can't see why you need to do the embedded query if it's the same table.  Why just do the top 20 on the outer query already filtered by start and end?

    (and as said an index on datetime can only help)

     

  • >>4. N -- How many messages per source/type

    If N is a variable parameter, you're going to have additional problems above the performance issue, namely that you can't use a variable with TOP (You have a hard-coded value of 20 in the example)

    In SQL 2005, the solution to this problem will be easy, using the new RANK T-SQL enhancements and the ability to use a variable with TOP.

    However, back in SQL2K ... the requirement for dynamic TOP calls for dynamic SQL, yuck, or use of a temp table and an IDENTITY for ranking. This becomes fugly in a hurry:

    -- Create an empty temp table

    SELECT Identity(int, 1, 1) As Rank, *

    INTO #RankedMsgs

    FROM MessageTbl

    WHERE 0 = 1

    -- Populate, order by source, type & date, for date-sequencial ranking of records

    -- within each src/type

    INSERT INTO #RankedMsgs (src, type, msg, dt)

    SELECT src, type, msg, dt

    FROM MessageTbl As a

    INNER JOIN @myTable AS b

          ON   b.src = a.src

          AND  b.type = a.type

    WHERE      a.dt BETWEEN @startTime AND @endTime

    ORDER BY src, type, dt

    -- Return results

    SELECT *

    FROM #RankedMsgs As r

    INNER JOIN

    -- Derived table to get the min rank within each src/type

    (

      SELECT src, type, Min(Rank) As MinRank

      FROM #RankedMsgs

      GROUP BY src, type

    ) dtMin

    WHERE r.src = dtMin.src

    AND   r.type = dtMin.type

    -- this implements the dynamic TOP, with dynamic value in @N

    AND   (r.Rank - dtMin.MinRank) <= @N

    ORDER BY r.src, r.type

     

     

  • Hello PW,

    Thank you for your reply. Do you think that the query you suggested would benefit from any additional indices. I currently do not have any indices in any of the columns involved.

    Thanks

    -CD

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply