How do you limit records using Distinct and Count - TOP and Rowcount dont work ?

  • ifila (1/13/2010)


    Please find attached the execution plan.

    Thanks

    Will check tomorrow. Is late here.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ifila (1/13/2010)


    Lynn Pettis (1/13/2010)


    Again, I have to ask, can the user specify the data range used in the query?

    Apologies i missed you question - yes they can

    Selection Criteria can be:

    Email

    From Date

    To Date

    Skills (Comma delimiter) Resume is stored as a FTI

    Then putting a clustered index on currentdateout will have an impact on the query. I suggest making the change I recommended earlier.

  • Please also provide your catalog and FTI definitions. In setting up a similar testing scenario, this would be helpful for us to replicate what you are seeing.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Hello all,

    I tend not to post, I'm normally just a read, absorb and learn along the way. However I thought a round of applause was worth giving for the ongoing help and support for a difficult thread. I am impressed that although not initally wanting to help themselves by providing information and even a clear question, everyone persisted and it looks like a solution has been found for this person.

    Hats off to you all, I feel I may have given in.

    Tina.

  • tina.muter (1/14/2010)


    Hello all,

    I tend not to post, I'm normally just a read, absorb and learn along the way. However I thought a round of applause was worth giving for the ongoing help and support for a difficult thread. I am impressed that although not initally wanting to help themselves by providing information and even a clear question, everyone persisted and it looks like a solution has been found for this person.

    Hats off to you all, I feel I may have given in.

    Tina.

    That is the nature of SSC - there are a lot of good people that are willing to give of themselves and help any who need help with SQL.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • After looking at the exec plan, I'm in full agreement with Lynn. Move the clustered index to the currentdateout column. That should greatly improve your performance and may make it possible to remove the top.

    The top isn't going to improve performance much. It's one of the last things executed in the query, just before the aggregate that produces the count and, up until that point, all the qualifying records are still in consideration. So the bulk of the processing is done on al qualifying records, not the small portion remaining after the TOP.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks!

  • Not being certain that I have the same FTI and catalog as yours, I was able to gain substantial improvement just by adding a Non-Clustered index to the currentdateout column. I have not tested with moving the Clustered Index to that column. I do believe, however, that you will see an improvement over even having a NC Index on that column - which equates to an overall substantial gain on your original query (as Gail and Lynn have said).

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (1/14/2010)


    Not being certain that I have the same FTI and catalog as yours, I was able to gain substantial improvement just by adding a Non-Clustered index to the currentdateout column. I have not tested with moving the Clustered Index to that column. I do believe, however, that you will see an improvement over even having a NC Index on that column - which equates to an overall substantial gain on your original query (as Gail and Lynn have said).

    IIRC, in an earlier post ifila indicated that there was a nonclustered index on currentdateout. My recommendation on making that the clustered index instead of the ID3 (the primary key) was that the selection criteria in the query is range based on currentdateout, and the query is using almost all of the columns in the table. Could say a gut feel recommendation as I really hadn't looked at the execution plan at the time.

  • A few months ago, a forum member suggested reducing the size of the email fields from 1000 to something more reasonable.

    So i created a TestDB that is an exact copy of the ProdDB, but uses Email lengths of 100 chars.

    On the ProdDB Gail's solution would always take 8 seconds, regardless of the number of records i wanted returned.

    TOP 60000 - 8 seconds all the way down to TOP 100 would still take 8 seconds.

    As i mentioned yesterday, using TOP on Email1 was MUCH faster, typically 2 seconds for any number of results.

    I had only been testing on the ProdDB, so i tried Gail's solution on my TestDB, and voila, the time went down to one second!

    TOP 60000 - 1 second.

    What is interesting is that the Query with TOP on Email1 never had any improvement in performance on the TestDB!

  • Lynn Pettis (1/14/2010)


    CirquedeSQLeil (1/14/2010)


    Not being certain that I have the same FTI and catalog as yours, I was able to gain substantial improvement just by adding a Non-Clustered index to the currentdateout column. I have not tested with moving the Clustered Index to that column. I do believe, however, that you will see an improvement over even having a NC Index on that column - which equates to an overall substantial gain on your original query (as Gail and Lynn have said).

    IIRC, in an earlier post ifila indicated that there was a nonclustered index on currentdateout. My recommendation on making that the clustered index instead of the ID3 (the primary key) was that the selection criteria in the query is range based on currentdateout, and the query is using almost all of the columns in the table. Could say a gut feel recommendation as I really hadn't looked at the execution plan at the time.

    I don't recall that. I noticed that my Exec Plan changed when I added the NC. Prior to that change (I started getting a key lookup and the noted improvement), I had the same exec plan as his. That is all I was basing my statement on.

    Agreed though, that the Clustered index would be an improvement, in this scenario.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • ifila (1/14/2010)


    On the ProdDB Gail's solution would always take 8 seconds, regardless of the number of records i wanted returned.

    TOP 60000 - 8 seconds all the way down to TOP 100 would still take 8 seconds.

    Yes, because, as I said earlier, the TOP is applied late in the query, after most of the processing.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ifila (1/14/2010)


    A few months ago, a forum member suggested reducing the size of the email fields from 1000 to something more reasonable.

    So i created a TestDB that is an exact copy of the ProdDB, but uses Email lengths of 100 chars.

    On the ProdDB Gail's solution would always take 8 seconds, regardless of the number of records i wanted returned.

    TOP 60000 - 8 seconds all the way down to TOP 100 would still take 8 seconds.

    As i mentioned yesterday, using TOP on Email1 was MUCH faster, typically 2 seconds for any number of results.

    I had only been testing on the ProdDB, so i tried Gail's solution on my TestDB, and voila, the time went down to one second!

    TOP 60000 - 1 second.

    What is interesting is that the Query with TOP on Email1 never had any improvement in performance on the TestDB!

    The reason for the TestDB working faster boils down to this: smaller columns results in smaller records which results in more records per page which results in few reads from disk for the same amount of data.

  • So Gail, something that I wonder occasionally (the rest of the time I wonder about everything else :-P) - if I SELECT TOP 10 and ORDER BY the clustered index, can the optimizer figure out that it just needs to look at the first ten records in the index and work with those? Or is it always the case that it will process the query on the dataset and THEN apply the TOP filter?

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Lynn Pettis (1/14/2010)


    ifila (1/14/2010)


    A few months ago, a forum member suggested reducing the size of the email fields from 1000 to something more reasonable.

    So i created a TestDB that is an exact copy of the ProdDB, but uses Email lengths of 100 chars.

    On the ProdDB Gail's solution would always take 8 seconds, regardless of the number of records i wanted returned.

    TOP 60000 - 8 seconds all the way down to TOP 100 would still take 8 seconds.

    As i mentioned yesterday, using TOP on Email1 was MUCH faster, typically 2 seconds for any number of results.

    I had only been testing on the ProdDB, so i tried Gail's solution on my TestDB, and voila, the time went down to one second!

    TOP 60000 - 1 second.

    What is interesting is that the Query with TOP on Email1 never had any improvement in performance on the TestDB!

    The reason for the TestDB working faster boils down to this: smaller columns results in smaller records which results in more records per page which results in few reads from disk for the same amount of data.

    Couple that with what Gail said already - the heavy hitting in the processing and cost is prior to the TOP clause. Since the heavy hitting is prior to the top clause, there will not be much improvement with or without the TOP clause due to the data size that comes prior to the TOP.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 15 posts - 76 through 90 (of 103 total)

You must be logged in to reply to this topic. Login to reply