SELECT DISTINCT on Long Table

  • Hello,

    I have a table with over 10 million rows. Each row has one of 8 dates attached to it, roughly evenly distributed across the 8 days. Our Access front-end has a drop down to choose one of these dates, basically populated by:

    SELECT DISTINCT Day FROM MyTable ORDER BY Day DESC

    There is a nonclustered index on the Day field, but this query still takes about 5 seconds to run, which is a bit annoying for our users when they click the drop down, especially since there are only 8 distinct Days at any given time. I've tried rewriting the DISTINCT as a GROUP BY and it gets the same estimated execution plan and essentially the same runtime.

    Every week, we add a new Day's worth of data into our table and move out the oldest Day's data, so the values in this table change periodically. Short of creating a separate Day listing table, is there a way to speed this up?

    Thanks.

  • Steve F. (4/18/2008)


    Hello,

    I have a table with over 10 million rows. Each row has one of 8 dates attached to it, roughly evenly distributed across the 8 days. Our Access front-end has a drop down to choose one of these dates, basically populated by:

    SELECT DISTINCT Day FROM MyTable ORDER BY Day DESC

    There is a nonclustered index on the Day field, but this query still takes about 5 seconds to run, which is a bit annoying for our users when they click the drop down, especially since there are only 8 distinct Days at any given time. I've tried rewriting the DISTINCT as a GROUP BY and it gets the same estimated execution plan and essentially the same runtime.

    Every week, we add a new Day's worth of data into our table and move out the oldest Day's data, so the values in this table change periodically. Short of creating a separate Day listing table, is there a way to speed this up?

    Thanks.

    Can't tell... can't see your query... post it...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I don't understand. I did post it. Here it is again:

    SELECT DISTINCT Day FROM MyTable ORDER BY Day DESC

  • I don't think I would name a table MyTable, but ok.

    Is the Day field indexed? An index on just this field should make the query much faster.

    I would, however, seriously consider having a table with your lookup information rather than doing a select distinct on the table - especially from MS Access in a lookup control.

  • Can't see your query

    Is the Day field indexed?

    Am I seeing a different question to everyone else?

    Anyway... is the index up-to-date?

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Michael Earl (4/18/2008)


    I don't think I would name a table MyTable, but ok.

    Is the Day field indexed? An index on just this field should make the query much faster.

    I would, however, seriously consider having a table with your lookup information rather than doing a select distinct on the table - especially from MS Access in a lookup control.

    I said it was indexed.

    The table is not actually called MyTable. The actual name of the table is of no significance to the question.

  • I have to ask, why not have a separate Days table? At 8 rows, it would be nearly instantaneous to query. Update it when you do your weekly load on the main table. (From what you wrote in your question, I gather you don't like the idea of a separate Days table, but I'm not sure why.)

    Since the main table has an index on the column being selected, assuming the statistics on the table are reasonably up-to-date, your query is about as fast as it's going to get.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • assuming the statistics on the table are reasonably up-to-date

    I'm not convinced they are. I'm still waiting for my reply...

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Steve -

    Is the drop down list showing the LAST 8 days? If not - is there a pattern in to what is shown? I'm just curious if you actually have to be pulling it from that table at all - or can we gen the data some other way?

    If you do feel you have to pull it from that table - is there any reason you couldn't create this once/day (or even once/hour)?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4/18/2008)


    Steve -

    Is the drop down list showing the LAST 8 days? If not - is there a pattern in to what is shown? I'm just curious if you actually have to be pulling it from that table at all - or can we gen the data some other way?

    If you do feel you have to pull it from that table - is there any reason you couldn't create this once/day (or even once/hour)?

    The 8 days are a week apart. I just tried getting MAX(Day) and cross joining it to a derived table containing 8 week offsets (0, -1, ... , -7) and did a DateAdd. This is almost instant, so I can live with this. But you (and others) are right: A value table that is updated once per data load is probably a reasonable idea.

    Ryan: The stats are updating right now. This takes time. I will try the query once this has completed.

  • Ryan: The stats are updating right now. This takes time. I will try the query once this has completed.

    Thanks Steve.

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • I just updated the stats with a 20% sample size; this took about 30 minutes. The DISTINCT query's performance does not seem to have improved.

  • Steve F. (4/18/2008)


    I just updated the stats with a 20% sample size; this took about 30 minutes. The DISTINCT query's performance does not seem to have improved.

    That's that then. Looks like the day table is your best option... 🙂

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Depending on data and queries, an indexed view could be a perfect solution.

    http://msdn2.microsoft.com/en-us/library/aa933145.aspx

  • Robert (4/21/2008)


    Depending on data and queries, an indexed view could be a perfect solution.

    http://msdn2.microsoft.com/en-us/library/aa933145.aspx

    Since an indexed view is, fundamentally, a separate table, that would work. I'd go with a separate table, because, to me, that's easier to document. But really, they're the same solution and either is just as good as the other.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply