How do you limit records using Distinct and Count - TOP and Rowcount dont work ?

Question

How do you limit records using Distinct and Count - TOP and Rowcount dont work ?

Viewing 15 posts - 76 through 90 (of 103 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 1

ifila (1/13/2010)
Please find attached the execution plan.
Thanks

Will check tomorrow. Is late here.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 2

ifila (1/13/2010)
Lynn Pettis (1/13/2010)
Again, I have to ask, can the user specify the data range used in the query?
Apologies i missed you question - yes they can
Selection Criteria can be:
Email
From Date
To Date
Skills (Comma delimiter) Resume is stored as a FTI

Then putting a clustered index on currentdateout will have an impact on the query. I suggest making the change I recommended earlier.

SQLRNNR SSC Guru Points: 281324 More actions · Answer 3

Please also provide your catalog and FTI definitions. In setting up a similar testing scenario, this would be helpful for us to replicate what you are seeing.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

tina.muter Valued Member Points: 58 More actions · Answer 4

Hello all,

I tend not to post, I'm normally just a read, absorb and learn along the way. However I thought a round of applause was worth giving for the ongoing help and support for a difficult thread. I am impressed that although not initally wanting to help themselves by providing information and even a clear question, everyone persisted and it looks like a solution has been found for this person.

Hats off to you all, I feel I may have given in.

Tina.

SQLRNNR SSC Guru Points: 281324 More actions · Answer 5

tina.muter (1/14/2010)
Hello all,
I tend not to post, I'm normally just a read, absorb and learn along the way. However I thought a round of applause was worth giving for the ongoing help and support for a difficult thread. I am impressed that although not initally wanting to help themselves by providing information and even a clear question, everyone persisted and it looks like a solution has been found for this person.
Hats off to you all, I feel I may have given in.
Tina.

That is the nature of SSC - there are a lot of good people that are willing to give of themselves and help any who need help with SQL.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 6

After looking at the exec plan, I'm in full agreement with Lynn. Move the clustered index to the currentdateout column. That should greatly improve your performance and may make it possible to remove the top.

The top isn't going to improve performance much. It's one of the last things executed in the query, just before the aggregate that produces the count and, up until that point, all the qualifying records are still in consideration. So the bulk of the processing is done on al qualifying records, not the small portion remaining after the TOP.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

isuckatsql SSChampion Points: 10381 More actions · Answer 7

isuckatsql

SSChampion

Points: 10381

January 14, 2010 at 9:25 am

#1103713

Thanks!

SQLRNNR SSC Guru Points: 281324 More actions · Answer 8

Not being certain that I have the same FTI and catalog as yours, I was able to gain substantial improvement just by adding a Non-Clustered index to the currentdateout column. I have not tested with moving the Clustered Index to that column. I do believe, however, that you will see an improvement over even having a NC Index on that column - which equates to an overall substantial gain on your original query (as Gail and Lynn have said).

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 9

CirquedeSQLeil (1/14/2010)
Not being certain that I have the same FTI and catalog as yours, I was able to gain substantial improvement just by adding a Non-Clustered index to the currentdateout column. I have not tested with moving the Clustered Index to that column. I do believe, however, that you will see an improvement over even having a NC Index on that column - which equates to an overall substantial gain on your original query (as Gail and Lynn have said).

IIRC, in an earlier post ifila indicated that there was a nonclustered index on currentdateout. My recommendation on making that the clustered index instead of the ID3 (the primary key) was that the selection criteria in the query is range based on currentdateout, and the query is using almost all of the columns in the table. Could say a gut feel recommendation as I really hadn't looked at the execution plan at the time.

isuckatsql SSChampion Points: 10381 More actions · Answer 10

A few months ago, a forum member suggested reducing the size of the email fields from 1000 to something more reasonable.

So i created a TestDB that is an exact copy of the ProdDB, but uses Email lengths of 100 chars.

On the ProdDB Gail's solution would always take 8 seconds, regardless of the number of records i wanted returned.

TOP 60000 - 8 seconds all the way down to TOP 100 would still take 8 seconds.

As i mentioned yesterday, using TOP on Email1 was MUCH faster, typically 2 seconds for any number of results.

I had only been testing on the ProdDB, so i tried Gail's solution on my TestDB, and voila, the time went down to one second!

TOP 60000 - 1 second.

What is interesting is that the Query with TOP on Email1 never had any improvement in performance on the TestDB!

SQLRNNR SSC Guru Points: 281324 More actions · Answer 11

Lynn Pettis (1/14/2010)
CirquedeSQLeil (1/14/2010)
Not being certain that I have the same FTI and catalog as yours, I was able to gain substantial improvement just by adding a Non-Clustered index to the currentdateout column. I have not tested with moving the Clustered Index to that column. I do believe, however, that you will see an improvement over even having a NC Index on that column - which equates to an overall substantial gain on your original query (as Gail and Lynn have said).
IIRC, in an earlier post ifila indicated that there was a nonclustered index on currentdateout. My recommendation on making that the clustered index instead of the ID3 (the primary key) was that the selection criteria in the query is range based on currentdateout, and the query is using almost all of the columns in the table. Could say a gut feel recommendation as I really hadn't looked at the execution plan at the time.

I don't recall that. I noticed that my Exec Plan changed when I added the NC. Prior to that change (I started getting a key lookup and the noted improvement), I had the same exec plan as his. That is all I was basing my statement on.

Agreed though, that the Clustered index would be an improvement, in this scenario.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 12

ifila (1/14/2010)
On the ProdDB Gail's solution would always take 8 seconds, regardless of the number of records i wanted returned.
TOP 60000 - 8 seconds all the way down to TOP 100 would still take 8 seconds.

Yes, because, as I said earlier, the TOP is applied late in the query, after most of the processing.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 13

ifila (1/14/2010)
A few months ago, a forum member suggested reducing the size of the email fields from 1000 to something more reasonable.
So i created a TestDB that is an exact copy of the ProdDB, but uses Email lengths of 100 chars.
On the ProdDB Gail's solution would always take 8 seconds, regardless of the number of records i wanted returned.
TOP 60000 - 8 seconds all the way down to TOP 100 would still take 8 seconds.
As i mentioned yesterday, using TOP on Email1 was MUCH faster, typically 2 seconds for any number of results.
I had only been testing on the ProdDB, so i tried Gail's solution on my TestDB, and voila, the time went down to one second!
TOP 60000 - 1 second.
What is interesting is that the Query with TOP on Email1 never had any improvement in performance on the TestDB!

The reason for the TestDB working faster boils down to this: smaller columns results in smaller records which results in more records per page which results in few reads from disk for the same amount of data.

jcrawf02 SSC-Insane Points: 24198 More actions · Answer 14

So Gail, something that I wonder occasionally (the rest of the time I wonder about everything else :-P) - if I SELECT TOP 10 and ORDER BY the clustered index, can the optimizer figure out that it just needs to look at the first ten records in the index and work with those? Or is it always the case that it will process the query on the dataset and THEN apply the TOP filter?

---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

SQLRNNR SSC Guru Points: 281324 More actions · Answer 15

Lynn Pettis (1/14/2010)
ifila (1/14/2010)
A few months ago, a forum member suggested reducing the size of the email fields from 1000 to something more reasonable.
So i created a TestDB that is an exact copy of the ProdDB, but uses Email lengths of 100 chars.
On the ProdDB Gail's solution would always take 8 seconds, regardless of the number of records i wanted returned.
TOP 60000 - 8 seconds all the way down to TOP 100 would still take 8 seconds.
As i mentioned yesterday, using TOP on Email1 was MUCH faster, typically 2 seconds for any number of results.
I had only been testing on the ProdDB, so i tried Gail's solution on my TestDB, and voila, the time went down to one second!
TOP 60000 - 1 second.
What is interesting is that the Query with TOP on Email1 never had any improvement in performance on the TestDB!
The reason for the TestDB working faster boils down to this: smaller columns results in smaller records which results in more records per page which results in few reads from disk for the same amount of data.

Couple that with what Gail said already - the heavy hitting in the processing and cost is prior to the TOP clause. Since the heavy hitting is prior to the top clause, there will not be much improvement with or without the TOP clause due to the data size that comes prior to the TOP.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events