Query improvement

  • Hi,

    How do i improve the performance of this query:

    select * from table where Month(datefield)='mm'??

  • .... where Month(datefield)='mm' will probably not use an index because of the function you apply to the column.

    - replace 'mm' by just the number (avoid implicit conversions)

    - Can you switch to a between condition? (e.g. you want to query only dates of one year)

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hi,

    I was asked this question in an interview...as to how this query may be rephrased for performance improvement...

    what wud be u r take on this??

  • As ALZDBA said, the function will prevent index seeks. Change the query to avoid any function on the columns in the where clause.

    Also, the select * makes covering indexes hard. If the where clause will return a large portion of the table (>10%), then any NC index will likely be ignored.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I was asked this question in an interview...

    My answer would be exactly the same.

    IMO your interviewer was only interested in the fact how you would

    play along with this kind of problem.

    Kind of seing how you would start to tackle stuff like this.

    Chances are you've been tested for "ease of responce" not even

    challenging the technical matters or correctness.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • vaidyanathan_kalyanasundaram (3/3/2008)


    Hi,

    How do i improve the performance of this query:

    select * from table where Month(datefield)='mm'??

    If it were me in the interview and I was asked that question about that query, here's what my answer would be...

    "Um... sure... There's actually a couple of problems with such a query...

    The first thing is the SELECT *. Generally, SELECT * will, at best, force an Index Scan instead of allowing an Index Seek to occur. Only the absolute necessary columns shoud be included in the SELECT list.

    Number two is the date thing. That should be made to look at a range of dates so that no conversion takes place on the "date field" itself.

    Fixing number two also takes care of the 3rd possible performance problem which is the implicit conversion between the INTEGER value that MONTH returns and the apparent string value used for 'mm'. Implicit conversions are bad because, they too, will sometimes cause a Scan instead of a Seek... but not always.

    Number 4 would be to check the execution plan to see if it was hitting an index. If not, I'd probably write some index creation code for submital to the DBA along with this code.

    Number 5 isn't a computer performance problem... it's a human performance problem. Hard coding of values such as 'mm' make code changes difficult on the next person and may induce an error because you might not catch all the places where it needs to be changed or you might change a different hard coded number by mistake. The value of 'mm' should be, at worst, included in a declared variable at the beginning of the code and then the variable should be used throughout the code instead of the hard coding.

    There're two more things wrong with the code that have to do with performance of the next person that may have to modify the code... there's no documentation as to what the code does and why and the readability leaves a bit to be desired.

    So, with all of that in mind, let's pretend I actually know what the intent of the code is... here's what the code would look like when I was done with it, sans the company approved header...

    [font="Courier New"]--=====&nbspDeclare&nbspand&nbsppreset&nbsplocal&nbspvariables

    DECLARE&nbsp@ReportMonth&nbspDATETIME&nbsp--First&nbspof&nbspdesired&nbspreporting&nbspmonth

    &nbsp&nbsp&nbsp&nbspSET&nbsp@ReportMonth&nbsp=&nbspCAST('2008'+'12'+'01'&nbspAS&nbspDATETIME)

    --=====&nbspProduce&nbspa&nbspreport&nbspof&nbspemployees&nbspthat&nbsphad&nbsptimecards

    &nbsp&nbsp&nbsp&nbsp&nbsp--&nbspfor&nbspthe&nbspmonth&nbspbeing&nbspreported

    &nbspSELECT&nbspEmployeeID,

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFirstName,

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspLastName&nbsp

    &nbsp&nbsp&nbspFROM&nbspdbo.jbmTest&nbsp

    &nbsp&nbspWHERE&nbspDateField&nbsp>=&nbsp@ReportMonth

    &nbsp&nbsp&nbsp&nbspAND&nbspDateField&nbsp<&nbsp&nbspDATEADD(mm,1,@ReportMonth)[/font]

    Of course, I'll follow whatever your formatting standards are... You do have standards, don't you? :w00t::D:P:hehe::)"

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • "and, sorry, I almost forgot... there aren't any "fields" in a database, they're called "colums"" 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • In addition to others mentioned,

    Specially on the month issue,

    If you know the years of data stored within the table and if it is only few, You can use >= and < instead of Month function.

    If it is only one year, you can proceed with single statement. Otherwise you have to explore UNION ALL or OR options.

    If you do not have the above option, you need to think on having an additional computed column on MONTH(Datefield) on that table and create an index on it.

    Cheers,
    Prithiviraj Kulasingham

    http://preethiviraj.blogspot.com/

  • I'm curious... why would you use UNION ALL or OR on such a thing?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (3/3/2008)


    I'm curious... why would you use UNION ALL or OR on such a thing?

    I believe the person wants to do the query based on the year and month. For example if the data consists of 2007 and 2008 and the month needed was January, it would be like

    Select col1, col2 from tab1

    where datefld between '01-Jan-2007' and '01-Feb-2007'

    UNION ALL

    Select col1, col2 from tab1

    where datefld between '01-Jan-2008' and '01-Feb-2008'

    -Roy

  • Thank you all for your inputs...

  • Roy Ernest (3/3/2008)


    Jeff Moden (3/3/2008)


    I'm curious... why would you use UNION ALL or OR on such a thing?

    I believe the person wants to do the query based on the year and month. For example if the data consists of 2007 and 2008 and the month needed was January, it would be like

    Select col1, col2 from tab1

    where datefld between '01-Jan-2007' and '01-Feb-2007'

    UNION ALL

    Select col1, col2 from tab1

    where datefld between '01-Jan-2008' and '01-Feb-2008'

    Ah... understood. Thanks, Roy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Wouldn't it have been better to do

    SELECT col1,col2

    FROM tablename

    WHERE year(datecol) in (2007,2008) AND month(datecol) = 1

    instead of using a union

    as long as you use integers for the year and month values no conversion needs take place as per earlier replies.

  • No, because using functions on the column in the where clause means that SQL cannot use an index seek to locate the rows, and it will have to scan.

    OR or union allow SQL to seek an index on the data column (if it exists) to retrieve the applicable rows.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Is it true that a Nonclustered index occupies additional space apart from the Heap space??

    ANd that a Clustered index is not stored seperately from the base table??

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply