Can we make this query run faster?

Question

Post reply

Can we make this query run faster?

crbd98

Say Hey Kid

Points: 709
More actions
November 2, 2005 at 4:05 pm

#165785

Hello All,
Consider the following table:
create table MessageTbl
(
   msgId int primary key clustered,
   src nvarchar(50),      -- Source of the message
   type int,                  -- Type of the message.
   msg nvarchar(1000), -- Text of the message
   dt   datetime            -- When the message was posted
)
With indexes:
CREATE INDEX IX_MessageTbl_1
ON MessageTbl (src, type, dt DESC)
CREATE INDEX IX_MessageTbl_2
ON MessageTbl (dt DESC)
In the following queries @myTable, @startime , and @endTine are defined as:
declare @myTable table( src nvarchar(50), type int)
declare @startTime datetime
declare @endTime datetime

Consider the following two queries:
------------- #1
SELECT     a.*
FROM       MessageTbl AS a
INNER JOIN @myTable AS b
      ON   b.src = a.src
      AND b.type = a.type
WHERE      a.dt BETWEEN @startTime AND @endTime
AND       (SELECT COUNT(*)
           FROM   MessageTbl AS c
           WHERE c.src = a.src
           AND    c.type = a.type
           AND    c.dt BETWEEN @startTime AND @endTime
           AND    c.dt <= a.dt) <= 20
------------- #2
SELECT     a.*
FROM       MessageTbl AS a
INNER JOIN @myTable AS b
      ON   b.src = a.src
      AND b.type = a.type
WHERE      a.dt BETWEEN @startTime AND @endTime
AND        a.dt IN (SELECT   TOP 20 dt
                    FROM     MessageTbl AS c
                    WHERE    c.src = a.src
                    AND      c.type = a.type
                    AND      c.dt BETWEEN @startTime AND @endTime
                    ORDER BY c.dt DESC)

Query #2 runs 6 times faster than #1 under the same conditions. Nevertheles, #2 has a big disadvantage to me because I need to vary the number of elements returned by the TOP clause. I could create the whole query string dynamically but I would consider that only as a last resource.
Frankly, I do not understand why #1 is slower. Both queries seem to favor from the defined indexes and have apparently the same complexity. The only difference is in one of the subqueries in the WHERE clause. While #1 has
AND       (SELECT COUNT(*)
           FROM   MessageTbl AS c
           WHERE c.src = a.src
           AND    c.type = a.type
           AND    c.dt BETWEEN @startTime AND @endTime
           AND    c.dt <= a.dt) <= 20
query #2 has
AND        a.dt IN (SELECT   TOP 20 dt
                    FROM     MessageTbl AS c
                    WHERE    c.src = a.src
                    AND      c.type = a.type
                    AND      c.dt BETWEEN @startTime AND @endTime
                    ORDER BY c.dt DESC)

I would appreciate any insight behind the differences in performance as well as any suggestions (new indices, hints, etc) on how to improve the performance of #1.

Thank you
- CD

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

philcart SSC-Forever Points: 47794 More actions · Answer 1

It would be helpful if you provided a description of what you are trying to achieve.

On first glance I'd suggest the following,

1) Move the sub-query from the WHERE clause to the FROM clause

2) Change the primary key to non-clustered. And change your index on the date column to the clustered one.

--------------------
Colt 45 - the original point and click interface

Gabor Nyul SSCrazy Eights Points: 9459 More actions · Answer 2

did you try to create a CLUSTERED index on dt, src, type? In this order. Because you are doing a range seeking (between) and a "LESS THEN" comparison so a clustered index on the dt column at the first position would be more effective

Bye
Gabor

Ian Yates SSCoach Points: 19738 More actions · Answer 3

Both of your subqueries are correlated, which is often a problem, but with the 1st query, SQL probably has to do a lot more evaluation and reading rows to determine counts, and then perform the <= 20 comparison.

I'm trying to work out what you are trying to do, but it's a bit late and I'm distracted by the tellie Could you elaborate what you are trying to accomplish...

I can only assume that you are trying to get messages from a set of sources + types between two dates - furthermore, you only want the top X records.. I can think of a few ways, but before trying other things, you should try adding in the extra indices as described above (clustered index on date, type and source in that order) and also make your query look like this

SELECT     a.*
FROM       MessageTbl AS a 
INNER JOIN @myTable AS b 
      ON   b.src = a.src 
      AND  b.type = a.type 
WHERE      a.dt BETWEEN @startTime AND @endTime 
AND       (SELECT COUNT(*) 
           FROM   MessageTbl AS c 
           WHERE  c.src = a.src 
           AND    c.src = b.src
           AND    c.type = a.type 
           AND    c.type = b.type
           AND    c.dt BETWEEN @startTime AND @endTime 
           AND    a.dt >= @startTime
           AND    c.dt <= a.dt) <= 20

This might give SQL Server a better way of matching up your tables (since A & B have the same values for src & type columns and a's dt field will always be >= @startTime. Also, you could try making @myTable a temp table, rather than a table variable, and add indices to it.

If you let us know your requirements, we can probably come up with something better. Cheers!

crbd98 Say Hey Kid Points: 709 More actions · Answer 4

Ian,

I appologize for not giving a better picture of what I want to accomplish. This is a brief description.

The MessageTbl table accumulates messages of different types (type column) from different sources (src column). A specific source can have many types associated. The dt column is the time the message was recorded.

What I want is equivalent to the union of the top N messages in a time frame for each row (src, type) in the @myTable variable. Our current application performs all the queries very inneficiently from the client side. I am attemting to optimize the performance by creating a SP that speeds up the process.

Regards

- CD

Ian Yates SSCoach Points: 19738 More actions · Answer 5

No probs CD

How did the other suggestions impact on your query execution speed? If they didn't help, I'll see if I can think of some way of using grouping etc to make it efficient...

Cheers