Indexes - Newbie

  • I'm sort of a novice but I got a question. (I'm more of a programmer than a DBA).

    When I create tables, the identity field (PK) get's an index automatically. Should I create indexes for other fields? I'll be sorting on "order","createddate","username","status"...so I'm guessing I'll need indexes on all 4 of those?

    What's a clustered and a non-clustered index? should i create those?

    DDL of user table:

    user_id

    username

    status

    created_date

    modified_date

    user_order

    description

  • Clustered index should be defined on columns that monotonically increase especially if the table is subject to many INSERTS, UPDATES, and DELETES.

    If a table is subject to few data modification, but too many SELECT statements, then this is less useful, and other options for the clustered index may be considered;

    Clustered indexes are useful for queries that meet the specifications below:

     

    1)     For queries that SELECT by a large range of values or where sorted results are needed. This is because the data is already pre-sorted in the index for the queries. Examples of this include when you are using BETWEEN, <, >, GROUP BY, ORDER BY, and aggregates such as MAX, MIN, and COUNT in queries.

    2)     For queries that look up a record with a unique value (such as RecordID) and you need to retrieve most or all of the data in the record. This is because the query is covered by the index. In other words, the data you need is in the index itself, and SQL Server does not have to read any additional pages to retrieve the data requested.

    3)     For queries that access columns with a limited number of distinct values. But if column data has little distinctiveness, such as columns with a yes or no, then a clustered index must not be defined on them (waist resources).

    4)     For queries that use the JOIN or GROUP BY clauses.

    5)     For queries where a lot of rows are returned, just not a few. Again, this is because the data is in the index and does not have to be looked up elsewhere.

    The contention between database readers and writers has to be taken into account in designing indexes. Indexes are great for reading from the database, but they must be updated so they slow down database writes.

    This does not mean that you do not have to build indexes for an update. Indexes are vital in locating the row to be updated. Finding and updating the row rapidly reduces the transaction time and the whole database runs smoother.

    Since SQL Server 7.0, the query optimizer has been able to retrieve the data from the index without having to load the table’s data pages. When data is located using a non-clustered index, SQL Server uses a bookmark lookup operation to jump from the non-clustered index to the clustered data pages.

    This becomes significant when the query is retrieving multiple rows that may be located in the same page in a non-clustered index, but spread out over the clustered index. Rows that might have been identified from two non-clustered index pages are then fully retrieved using the bookmark lookup to retrieve data from dozens of data pages.

    Since the whole purpose of indexing is to reduce the number of physical page reads, if all the required columns can be located and then retrieved from the non-clustered index pages without having to access the clustered data pages, which means a performance improvement.

    Hope this helps.

  • I'll add this - you can have only one clustered index on a table because that index causes the entire data of the table to essentially become an index.

    You can have many non-clustered indexes on a table, and you can 'include' columns to make a non-clustered index contain all of the columns that your query needs. This makes it a "covering index" and means sql doesn't have to hit the base table and doesn't need to do the bookmark lookup.

    You may find the Inside Sql Server books (or just see what's popular on Amazon) useful for learning a great deal more. Books online is pretty good too, but I'm not sure how easy it is to jump into if you're very green to DBs.

  • Ignore indexes for now.  While very useful, you can get yourself into some problems if you don't use indexes correctly.  Your user table should have the user_id column set as the Primary Key, which, incidentally, is also your clustered index.  Once your application is complete, find someone more experienced to help you apply additional non-clustered indexes if you feel you are having performance problems.

    The Inside SQL Server series is by far the best set of SQL books to invest in; second only to Books Online.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply