Index columns Selectivity

  • Hi,

    I have often been doubtful on few things that I notice in my execution plan. Let's say that my query has the following filter:

    WHERE order.PostedDate >= '11/1/2009'

    AND order.OrderStatus = 'P'

    ANd order.ocr='O'

    Now, assume that there is no non-clustered index on the order table so it is currently scanning the rows from the clustered index.

    Now, I get a suggestion from the query engine to create a non-clustered index as follows:

    CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

    ON [dbo].[Orders] ([OrderStatus],[ocr],[PostedDate])

    Now my question is: How the optimizer decides on the order of columns in the indexes.

    After reading from forums threads here and other blogs, I concluded that the most selective column goes into index first because statistics are created based on that first column only. so how does the optimizer know the selectivity and how can I find it out without having to look at missing index feature.

    Thanks in advance!

    Regards

    Chandan Jha

  • It's not selectivity that's the biggest key.

    http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/

    http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

    http://qa.sqlservercentral.com/articles/Indexing/68439/

    http://qa.sqlservercentral.com/articles/Indexing/68563/

    http://qa.sqlservercentral.com/articles/Indexing/68636/

    Edit: fixed one link

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the link. Being a regular follower of your posts, I had gone through the first link many times.

    I may not be able to see everything clearly but i did try.

    Your post does say that most selective column is good as a first column in an index but as you also mentioned the real movie starts when that column is not used in the query as a filter.

    Now we are left with say 3 columns as in your example as well as my example where filters are put on.

    now comes into picture what you mentioned as density in your post:

    "DBCC Show_Statistics with the DENSITY_VECTOR option". Density is 1\Selectivity so it means high selectivity or low number of density(as low as possible)

    So the question that I am still not able to find answer to is: how to decide which columns out of the above 3 should go as first column in the index. do I need to run the DBCC command mentioned above on all 3 columns and whichever column gives me a lowest value becomes my first column in the index.

    I hope you can answer my confused mind.

    Regards

    Chandan

  • chandan_jha18 (9/14/2012)


    So the question that I am still not able to find answer to is: how to decide which columns out of the above 3 should go as first column in the index. do I need to run the DBCC command mentioned above on all 3 columns and whichever column gives me a lowest value becomes my first column in the index.

    No, you don't need to run DBCC or anything, you need to look at the queries that will be using that table and figure out which is the best leading column based on all the queries. Details as to why are in my blog posts and articles.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I apologize for sounding so dumb 🙁

    I read your links but still cannot figure out how can we decide the order of the columns. People who wrote the query might know this but lets say a consultant is given a task to find the order of the columns to satisfy the query. So is there any formula to find out?: or he simply will keep altering an index in different order till he finds something good.

    I have the where clause with 3 filters on column1, column2 and column3. now which column should go first in the index is still I am not sure. I know it should be selective as compared to others. but how do I know that.

    Thanks

    Chandan

  • No, there is no magic formula.

    You look at all the queries running against that table (not just one), then you figure out which columns they filter on and which order of the columns in the index will satisfy the most queries.

    If you don't want to do all that work, then equality predicates first, inequality after and it'll work for that query at least, maybe not for others.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply