Clustered index?

  • Hi,

    I'm trying to work out whether a clusted index is necessary on a particular table in my database.

    Basically the function of this table is that whenever a user runs a query through my application, the data is inserted into my query_results table. The data is then displayed from this table. When the user re-orders their results, we re-query the query_results table ordering the results by one of the existing non-clustered indices. When a user submits a new query, their previous query results are deleted and new results inserted. This table is only ever used as a temporary repository for a users most recent query results.

    Now there would seem to be no natural candidate for a clustered index, so perhaps I'll implement a clustered index based on an identity column.

    Is this likely to give me any performance boost, or would it be likely to degrade performance given that we have a new index to maintain. I've studied several articles on this site and can't seem to figure this out.

    Thanks,

    Marcus

  • Based on your description of what you are doing, I'd lean towards not creating a clustered index.  Remember the clustered index determines the order in which the data pages are stored in a table.  The primary advantage to having a clustered index is when you are retreiving ranges of data from your table.  Another advantage of clustered indexes is that the index and data are together so there's no need to query the index then fetch the data for that key as is the case with non-clustered indexes.

    So I'd answer the following questions:

    1 - Do I ever search for a range of data?  If so how often.

    2 - When I retreive data from my table, do I primarily use one index to find the data?  If so do I use the same key as I do when I do in question #1?

    If the answer to those questions are yes then you might benefit from a clustered index.  I say might because there's no cut and dry answer.  When in question test.  Create a copy of the table, create a clustered index, then execute some of your more common queries.  Make sure you have Query Analyzer show the execution plan for each query you test and compare the results to the same queries on you production table.  That'll give you your answer, which is something no one in the forum can do.

  • Marcus,

    My view would be to test and see (if possible). From that you will get the answer.

    [Ed] A while ago I would have agreed with you re: use clustered indexes for range searches. But having attended a Kimberly Tripp seminar (SQL Server Tuning for High Performance - SQL2005) where she demonstrated that better performance could be achived in range searches using non-clustered indexes in most cases, and also having read the debates with regard to clustered indexes (see KT's blog http://www.sqlskills.com/blogs/kimberly/2007/02/02/TheClusteredIndexDebateContinues.aspx), I believe that range searching is no longer a primary concern when choosing a clustered index (IMHO).

    HTH

  • Ed, iwg,

    first off thanks for your responses.

    Personally I was thinking that a clustered index might not be particularly appropriate given the way the this table is used, but it's always useful to get a second opinion.

    Just to try to clarify how the table is used:

    We have three main types of query - customers, people and vehicles. When we run a query through our web app, we have the users selected data inserted into our query results table, then displayed to the user. Within our query results table we have indexes based on User ID, Query ID, Customer ID, Person ID and Vehicle ID. Thus, User ID and Query ID will always be populated and one of the other three columns will also be populated. We never do range searches based on the query results. If a user decides to refine their query, we clear out their existing query results records, and populate new records based on their new criteria.

  • Hi Marcus,

    Got a few questions to see if I can be of any more help:

    Do you know if the indexes you have are being used? 

    How is the data population done (ie is it bulk insert or constant stream, etc.)? 

    How many rows and what size (B) are the rows?

    What size is the table (how many rows on average)?

    Is the data based on, say, datetime in the way it is inserted/updated in you table?

    If the table was small and pretty much static then I might agree with you about not having a clustered index, but if its not, then a clustered index should help performance (as long as you chose the right candidate). What do you use as the PK?

  • iwg,

    First off I was slightly wrong earlier, our indexed columns are query id (fk to another table), company id, dealer id, person id and company name.

    in answer to your questions:

    >Do you know if the indexes you have are being used?

    Yes, they are definitely being used.

    >How is the data population done (ie is it bulk insert or constant stream, etc.)? 

    SELECT INTO

    >How many rows and what size (B) are the rows?

    >What size is the table (how many rows on average)?

    The number of rows varies. Unfortunately we have a slight issue which prevents a users' data from being cleared down correctly, but when prior to inserting records, we delete all the users' previous records. Ideally, when the system is not in use, we should have no records. When the user runs a query we insert up to say 10,000 rows, which should be deleted when the user logs out of the system. This doesn't work properly at the moment, so we occasionally clear out the table by running an ad-hoc delete statement.

    >Is the data based on, say, datetime in the way it is inserted/updated in you table?

    It's based on a query id, a foreign key to another table.

    Unfortunately we don't have a primary key. (Incidentally this wasn't designed by me!)

    The SQL for the table definition is as follows:

    CREATE TABLE [dbo].[tblx_QueryResult] (

     [Query_ID] [int] NOT NULL ,

     [Country_ID] [int] NULL ,

     [Manufacturer_ID] [int] NULL ,

     [Company_ID] [int] NULL ,

     [CompanyCode] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CompanyName] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CompanyAddress_ID] [int] NULL ,

     [CompanyZipCode] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CompanyCity] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CompanyPhone] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CompanyMainContact] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CompanySalesPerson] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CompanyZone_ID] [int] NULL ,

     [CompanyZone] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CompanyStatus_ID] [int] NULL ,

     [CompanyActive] [int] NULL ,

     [Dealer_ID] [int] NULL ,

     [DealerCode] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [DealerName] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [DealerAddress_ID] [int] NULL ,

     [DealerCity] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [DealerPhone] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [DealerMainContact] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [DealerSalesPerson] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [DealerZone_ID] [int] NULL ,

     [DealerZone] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [DealerActive] [int] NULL ,

     [Person_ID] [int] NULL ,

     [PersonName] [nvarchar] (250) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [PersonPhone] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [PersonActive] [int] NULL ,

     [InvitingPerson_ID] [int] NULL ,

     [SalesPerson_ID] [int] NULL ,

     [Checked] [bit] NOT NULL ,

     [AdditionalForWeb] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    ) ON [PRIMARY]

    GO

     CREATE  INDEX [ix_QueryResult_Query_ID] ON [dbo].[tblx_QueryResult]([Query_ID]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [ix_QueryResult_Company_ID] ON [dbo].[tblx_QueryResult]([Company_ID]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [ix_QueryResult_Dealer_ID] ON [dbo].[tblx_QueryResult]([Dealer_ID]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [ix_QueryResult_Person_ID] ON [dbo].[tblx_QueryResult]([Person_ID]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [IX_tblx_QueryResult] ON [dbo].[tblx_QueryResult]([CompanyName]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

  • I really shouldn't since I'm repeating myself but the answer to your question lies in other questions you might have not answered yet.  You need to identify the queries most often being executed which retrieve data from these tables.  Until you answer those questions you are just guessing and your testing will be suspect.  It's not difficult to find out how these tables are queried, just run Profiler.  When you've determined how the data is being accessed you can determine not only if a clustered index is appropriate but also if some of those other indexes are appropriate.

  • The only candidate for clustered index I can see for such tables is IDENTITY column.

    If you use pagination for displaying results on front end it would be a good idea to insert retrieved data into "query" table once and return range of lines (say, 1-20 or 21-40) to front end.

    Then you don't need to run query multiple times when user switches from page to page, just change the range of rows to be returned.

    In this case you definitely need clustered index on a column containing auto generated numbers 1-20-40- etc.

    And I believe it's the only index you should have there.

    _____________
    Code for TallyGenerator

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply