Database Design - Indexes To Create

  • For anyone who wishes to help a newbie learn about using indexes...

    This is a question to help me better understand the value of creating indexes AND the costs creating too many. (We are on SQL Server 2000). I created a database to 'complement' our 3rd party ERP database. We create Access reports when the reports/information is not included in the ERP app. We can't touch the ERP database, so I created another one to add tables that enhance the reporting. One table I created is the User Activity Log - to track/monitor the Access reports the users are actually running.

    Fields:

    RowID, UserID (network login id), Activity (the report that was run), FirstDate (First date user ran the report), LastDate ( last date user ran the report), Tally (counter - number of times run).

    Each time a user runs a report the log is updated: (1) a new record is created the first time a user runs a report - all the fields are populated, or, (2) when the user has already run the report LastDate is updated and Tally is incremented.

    When I review activity there are two perspectives:

    1. User activity - What reports has each user run?

    2. Activity usage - What users have run each report?

    Currently there are only 98 records, this will never be a large file (maybe a few hundred records), but, it is a good one to learn the concepts. When answering the question below consider the file is tens of thousands of records and more. I'm guessing the performance for a few hundred records is insignificant to the system, much less the user. But, there is a point where an index makes a difference.

    Here are the indexes I figure are beneficial:

    1. User - because of the review by User

    2. Activity - because of the review by Activity

    3. User/Activity - because of the search performed when updating

    How am I doing? What would your recommendations be? Please explain 'Why' if it differs from my thinking.

    [font="Verdana"]"The Road To Success Is Always Under Construction."[/font]

  • Can you post the actual queries that the reports use? Without seeing those it's not possible to give accurate recomendations. Also, can you post the schema with data types.

    As for indexes, there's an introductory series here as well as some posts that may be useful on my blog.

    http://qa.sqlservercentral.com/articles/Indexing/68439/ (3 parts)

    http://sqlinthewild.co.za/index.php/category/sql-server/indexes/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the links to the articles - yours is very good. I'll delve into SQL In The Wild later.

    How do I post the schema with data types?

    If I understood the content correctly I think I can begin to construct the indexes:

    Clustered: RowID (Identity field) - it meets all four attributes

    * UserID/Activity would not because it will be wide.

    When I created the table an index was created on the UserID, but, not as clustered. So, I guess I should change that setting.

    QUERIES That I Use

    What is user activity?

    SELECT * FROM useractivitylog

    ORDER by userid, activity;

    Who is using the reports?

    SELECT * FROM useractivitylog

    ORDER by activity, userid;

    What is a report's usage?

    SELECT

    Activity,

    Min(FirstDate) AS FirstUsed,

    Max(LastDate) AS LastUsed,

    Sum(Tally) AS TtlUsed

    FROM UserActivityLog

    GROUP BY Activity;

    I'm one step closer. Thanks for the help so far. Can't wait for the "next lesson".

    [font="Verdana"]"The Road To Success Is Always Under Construction."[/font]

  • EdA ROC (2/1/2010)


    How do I post the schema with data types?

    The create table statements

    QUERIES That I Use

    What is user activity?

    SELECT * FROM useractivitylog

    ORDER by userid, activity;

    Who is using the reports?

    SELECT * FROM useractivitylog

    ORDER by activity, userid;

    Why SELECT *? Do these really need every single column?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I don't follow the 'The create tables statements'

    What do I need to do?

    As far as I know CREATE TABLE creates a table. I don't know of any variation that will simply list the existing table. The tutorial I used always did a DROP TABLE then CREATE TABLE with everything to literally create the table. Couldn't find anything Googling about "viewing" the table columns - lots of information about Views though. 🙂

    Anyway here's the table:

    [font="Courier New"]

    RowID smallint 4

    UserID nvarchar 24

    Activity nvarchar 40

    FirstDate datetime 8

    LastDate datetime 8

    Tally smallint 2

    Why SELECT *?

    Yes, as you can see, it's a simple table.

    My "reports" are running the query, copy/paste into Excel.

    [font="Verdana"]"The Road To Success Is Always Under Construction."[/font]

  • EdA ROC (2/1/2010)


    I don't follow the 'The create tables statements'

    I want you to post the definition of the table. Yes, CREATE TABLE creates a table, that's why I want to see the CREATE TABLE statement that created the tables in question. You can generate them from Enterprise manager. It's the easiest way to be sure that all the relevant information is there.

    Also, the CREATE INDEX statements for all indexes on the table.

    Why SELECT *?

    Yes, as you can see, it's a simple table.

    So in otherwords, lazy coding? There's very little that can be done when a query is running SELECT *.

    Do you really, absolutely, always need every single column in that table? If not, don't use SELECT * . It's bad, lazy coding practice

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I want you to post the definition of the table. Yes, CREATE TABLE creates a table, that's why I want to see the CREATE TABLE statement that created the tables in question. You can generate them from Enterprise manager. It's the easiest way to be sure that all the relevant information is there.

    When I read your reply I figured you know something I don't know, so I poked around Enterprise Manager for a while and found "Action > All Tasks > Generate SQL Script". And, of course to those familiar with the tools, discovered that this generates and exports a SQL statement to a file. So, now I know what you knew. Here's the CREATE TABLE statement:

    CREATE TABLE [dbo].[UserActivityLog] (

    [RowID] [int] IDENTITY (1, 1) NOT NULL ,

    [UserID] [nvarchar] (24) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [Activity] [nvarchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [FirstDate] [datetime] NULL ,

    [LastDate] [datetime] NULL ,

    [Tally] [smallint] NULL

    ) ON [PRIMARY]

    Then I went back to see if it would script for the INDEXES - I put money on it would. Lo and behold the resulting script included the CREATE TABLE! Dang! I could have saved myself a step. Ah, yes, the learning experience. 🙂 Here's the script for the indexes:

    CREATE INDEX [LoginID] ON [dbo].[UserActivityLog]([UserID]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [RowID] ON [dbo].[UserActivityLog]([RowID]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    -------------

    SELECT * ...

    In this case, which is rare, it's more efficient to use the *. On any given query I will view all the columns except RowID. Then again, out of curiosity, I have looked at RowID to see the order people started using a new report and out of it learn about their personalities. Such as - the same people tend to test drive before they need it, others wait until they need it and "have an emergency". Therefore, typing in each column is not efficient. You are correct though - Normally I will select only the columns needed.

    [font="Verdana"]"The Road To Success Is Always Under Construction."[/font]

  • So am I correct in saying that you have no primary key and no clustered index on this table?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Correct. Hence, my original request for assistance.

    With about 100 records and a max of 200 to maybe as high as 300, I wasn't concerned about indexes for this table. BUT! This simple table would be a good exercise in understanding principles and concepts.

    Here's my thinking so far:

    1. RowID would be a good candidate for the Primary key and the clustered index.

    Your article's reference to the 4 criteria supports what I knew intuitively but could not explicitly define. On more complex tables and situations my intuitiveness would have failed - notice I didn't say "probably failed". Now I have a basic understanding of what to consider.

    2. UserID and Activity are individually and collectively good candidates because they would be the focus of most queries. "Brute Force" brain storming puts 4 candidates on the table: (1) UserID, (2) Activity, (3) UserID & Activity, (4) Activity & UserID. Here's where understanding how indexes work would be beneficial in determining where to go next.

    What indexes would you create, and why?

    [font="Verdana"]"The Road To Success Is Always Under Construction."[/font]

  • Which is more selective on its own? UserID or Activity?

    Other than the cluster (yes, RowID is good as the PK), only two indexes will be needed, not sure yet which two, that's why I want to know selectivity.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Not sure what you mean by "more selective on its own"?

    There are two queries that I will run, the columns are the same, the order would be different. The first one probably more often than the other:

    SELECT ... by Activity, UserID

    SELECT ... by UserID, Activity

    HTH - Thanks for your time and effort.

    [font="Verdana"]"The Road To Success Is Always Under Construction."[/font]

  • EdA ROC (2/3/2010)


    Not sure what you mean by "more selective on its own"?

    More unique.

    There are two queries that I will run, the columns are the same, the order would be different.

    If you're not filtering, just using ORDER BY, then these indexes are not going to help you. They're not covering (since you're using SELECT *) and there is no way that SQL is going to use a noncovering index when it's returning all the rows from the table.

    For that matter, why are you returning all the rows from the table, unfiltered?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'm not sure what you mean by "which is more selective on its own?"

    Most of the time I will execute the following two queries, probably the first one more than the second one, let's say 2 to 1 ratio:

    1. SELECT ... FROM UserActivityLog ORDER by Activity, UserID

    or I may sort on Activity, Tally DESC [to more quickly see who's using the biggest user]

    2. SELECT ... FROM UserActivityLog ORDER by UserID, Activity

    BTW - Your question is teaching me "the questions to ask" during analysis. Thanks.

    [font="Verdana"]"The Road To Success Is Always Under Construction."[/font]

  • EdA ROC (2/3/2010)


    I'm not sure what you mean by "which is more selective on its own?"

    I answered that in my last post

    Most of the time I will execute the following two queries, probably the first one more than the second one, let's say 2 to 1 ratio:

    1. SELECT ... FROM UserActivityLog ORDER by Activity, UserID

    or I may sort on Activity, Tally DESC [to more quickly see who's using the biggest user]

    2. SELECT ... FROM UserActivityLog ORDER by UserID, Activity

    I answered that in my last post too.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Oh my gosh! The last exchange was because when I checked to see if there was a reply to my reply (the first one asking about 'more selective' and referring to the 2 queries) did NOT display, neither did your reply. So, I thought I hadn't replied and repeated myself - to which you replied that you had answered the questions.

    We haven't been on the same page for a while. I was using this table as a simple example and you've been looking at it too literally. This was not, after all, a good example for me to use since I won't be filtering and it's quicker to use SELECT * and scan the short list of that's returned.

    HOWEVER! Your last reply eludes to what I need to learn - if I understand it:

    It's the filtering, i.e. WHERE clause, that the indexes apply to. If I extend this further - the indexes are of no benefit to the ORDER clause?

    If my queries were:

    #1 - used most frequently:

    SELECT RowID, UserID, Activity, FirstDate, LastDate, Tally FROM UserActivityLog WHERE Activity = 'activity number 1' ORDER by UserID, Tally;

    #2 - used less frequently:

    SELECT RowID, UserID, Activity, FirstDate, LastDate, Tally FROM UserActivityLog WHERE UserID = 'JDoe' ORDER by Activity;

    Would your recommendation be to create indexes:

    1. RowID - Clustered <-- we covered this before

    2. Activity - Non-clustered <-- because most frequently used

    Might this be of value if the table were very large?:

    3. UserID - Non-clustered <-- because it's often used

    BTW - Now that I'm browsing around in Enterprise Manager I happened to look our ERP database and looked at the indexes for what's one of the most heavily used tables: ORDERS. Most query executions would be by order_no or job_number (a job can be 1 or more order_no's). There are 22 indexes, they are all NOT clustered. If I understand your article correctly I would have expected to see 1 clustered index on Order_No. Is my thinking correct?

    Again - I appreciate your time and effort ... and patience.

    [font="Verdana"]"The Road To Success Is Always Under Construction."[/font]

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply