Indexes

  • Hi,

    recently came across this question in an interview..

    1.What is selectivity of an Index??

    2.Consider a table which has millions of rows...(e.g One used by say Amazon.com site)..There is a field called gender...when we retreive data from this table(which may be a large volume of data ...)

    using this query

    select * from table where gender='M' or F...

    what kind of index do we need to create on this table so that the performance of this query gets better??

  • Best thing is you can try your query creting cluster Index, Non-cluster index and without using any index.

    If we think logically field is having M or F and probably half of the table will scanned so in that case do not use any index and let the table scan happne; if you are using any index than there will be over burden on the query and query has to refer the index pages and than data pages.

    Hope this may help you.

    ---------------------------------------------------
    "Thare are only 10 types of people in the world:
    Those who understand binary, and those who don't."

  • What if we had a table with an empid field which was inserted randomly and is not ordered..i.ee. instead of rows having empids as 4, 5 6,..we have a table with randomly inserted empids....

    How would we sort this column so that a query like..

    select * from emp where empid between 20 and 10,000

    performs better??

  • Remember one thumb rule; if your query fetches more than 50% than table scan is ideal else use indexes.

    Most probably as per database design empid is primary key (to avoid the duplicate empid) and PK is by default Cluster index.

    ---------------------------------------------------
    "Thare are only 10 types of people in the world:
    Those who understand binary, and those who don't."

  • Selectivity is a measure of how many different values are in the index compared with the number of total rows.

    If the values are unique, then you have 100% selectivity.

    I don't see any good index for that query. Since there's a select *, it will be very hard to get a covering index. Since the selectivity is around 50%, it'll be too expensive to do an index seek and bookmark lookups.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    In that case , if selectivity is >50% then always a table scan or clustered index scan is better..Am i right??

    And my earlier question was if we had a column that stored integer values randomly, how would we order that column , assuming that the No: of transactions on the table is high

  • Hi,

    In that case , if selectivity is >50% then always a table scan or clustered index scan is better..Am i right??

    Creating a clustered index on a column with low selectivity does not really help you in this situation, unless you want to return all "M"s or "F"s. If you want to hone in on a particular row then SQL will have to run through the entire table to find it. Unless you specify M or F then it has to look through all of the "M"s or "F"s. You need to create an index noncluster or clustered, on a column that is has the most uniqueness about it. For example, an id or SSN.

  • Clustered indexes provide some of the best performance gains, provided it is correctly selected. The leaf nodes of a clustered index contain the actual data. This means that SQL will not have to traverse the index pages to find data. Addtionally, clustered indexes sort the data, so only 1 is allowed per table.

    more info

    http://msdn2.microsoft.com/en-us/library/ms177443.aspx

    http://msdn2.microsoft.com/en-us/library/ms190639.aspx

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply