SELECT DISTINCT on Long Table

Question

SELECT DISTINCT on Long Table

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Steve F. Mr or Mrs. 500 Points: 587 More actions · Answer 1

I am going to implement this as a separate table. I don't want the additional overhead of the index on the table.

But I do have a question: Is this a common way to solve this problem? I ask because I really need to solve this for other fields, getting a distinct list from a 10 million record table where there are way more than 8 distinct values. Indexing on these fields would help by reducing the width of the records in the index scan, but indexing on these fields would still really be an unacceptably slow solution for speeding up these SELECT DISTINCTs, especially since the data only changes once per week.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

Well - SELECT DISTINCT tends to be a costly operation, so doing it over and over and over again is something to be avoided, IF it can be avoided. Having a "lookup table" is a good way to do that, especially if the values don't change at all, or like in your scenarion, don't change for a reasonable timeframe.

In that light - yes - it's a common practice. Treat it like a long-term table - if there are going to be lots of distinct values in the list - index it, etc...

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

GSquared SSC Guru Points: 260824 More actions · Answer 3

With data that doesn't update constantly (like several dozen/hundred/thousand/million changes per hour), this is a pretty common solution.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Robert-378556 SSCertifiable Points: 5651 More actions · Answer 4

GSquared (4/21/2008)
Since an indexed view is, fundamentally, a separate table, that would work. I'd go with a separate table, because, to me, that's easier to document. But really, they're the same solution and either is just as good as the other.

But there is a fundamental difference too. Data in indexed view is updated automatically, while a separate table must be updated either explicitly (inaccurate) or from triggers on source table (costly). In this case at least.

rui.romao SSC Rookie Points: 27 More actions · Answer 5

Hello,

I don´t know if you are familiar, but you can try to create a partition function to store the data in partitions. I not every percent shore but i think it could be a way.

RGDS

Rui

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 6

1) partitioning the table won't help, since the query at hand will still have to read all the data. Well, it could help a tiny bit if the partitions were on separate spindles and there were sufficient CPU cores to handle the async IO.

2) 10M row scan of a non-clustered index on a date column is still going to take a finite amount of time. Best you can do here is have sufficient RAM and sufficient IO so the query is hopefully answered from cache and pulled up from fast IO if not.

3) Caching the data either via an off-line-'manually'-maintained table is another solution, as is indexed view. Either could be an 'instantaneous' retrieval but using them would depend on the data modification situation on the base table.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

larry-699201 Grasshopper Points: 21 More actions · Answer 7

Regarding partitioning, if the partitions are by date, would not the Q.O. have the smarts

to return the 8 dates 1 from each partition very quickly? If it knows that there

is one unique date per partition, it should only need to perform 8 reads - at least

that's what I would do if I were the optimizer 🙂

Still, then there's partition maintenance routines that would be required.

Larry

*** Not yet a DBA - not only a developer ***

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

Larry, you are correct. the constraint used for the date partitioning would allow the optimizer to know that the only possible date for each of the 8 partitions would be the single date allowed by the optimizer. Hmm, actually, it may not. Unless you did a constraint that ensured exactly 00:00:00 for the time for example it would still have to read all the rows because the time is included with the datetime and smalldatetime datatypes. Perhaps a char representation would work too.

I misread the initial post and thought there were 8 different date FIELDS to choose from, not 8 day values for a single date field. :Whistling:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service