Time Bomb Coding

  • peter-757102 (2/10/2010)


    CraigIW (2/10/2010)


    nicholasw (2/10/2010)


    Instead of:

    IF EXISTS(SELECT 1 FROM ......WHERE....)

    Would the following be quicker still?

    IF EXISTS(SELECT TOP 1 1 FROM ......WHERE....)

    No, as the "top 1" is only done after the "select 1 from".

    Funny enaugh, it is not always that simple. I seen cases where a:

    select top 1 max(id)

    outperformed

    select max(id)

    many times over, and I suspect top sometimes acts as a undocumented hint. I got some testing to do on it still, and if it pans out it might also be a trick to control table variables bestter by using top to provide an estimate other then the constant 1 and thus get better query plans.

    Crazy! Good to know that though.

  • Well done-- very useful!

  • Catch all Lookup tables are the bane of my life.

    I've lost count of how many times I've had to write queries which join to the same lookup table 10 or more times to retrieve some basic data items.

    As your article points out, it makes enforcing business logic very difficult, sometimes these data values can be mandatory but there is no straight forward way of enforcing that.

    It also hides the true data structure so when someone new to the database is trying to mine the information schema tables to try and find certain data items they find themselves unable to locate key columns which have been instead hidden into a lookup!

    Finally, the performance hit. All the extra processing which has to go on to flatten the data from the lookup tables. Storage is cheap, CPU cycles wastes energy!

  • If you look throught he source of the built in system stored procs you will see quite a few cases of IF(SELECT COUNT(*)...)>0

    Try sp_helptext 'sp_helptext' for an example!

  • "Catch all Lookup tables are the bane of my life.

    I've lost count of how many times I've had to write queries which join to the same lookup table 10 or more times to retrieve some basic data items."

    Iv come across the same, This becomes worse if proper documentation is not present.

    But having lookup tables has its own advantages, There are scenarios when u r dealing with a huge DB and if its not normalized atleast to the 2nd level u face loads of performance related problems especially with data retrieval.

    [font="Times New Roman"]For better assistance in answering your questions
    Click Here[/url][/font]

  • Ryan C. Price (2/10/2010)


    Not that I want to encourage bad habits, but I seem to recall a discussion (I thought it was here @ SQL Server Central, but I can't find it) regarding COUNT(*) vs EXISTS, and tests were carried out (in SQL 2005 or 2008) where the compiler was smart enough to recognize that a programmer really intended to use EXISTS not COUNT(*), and so actually generated the same execution plan for both - the SQL Server engineering team compensating for the huge amount of abuse of COUNT(*) that goes on.

    Depends.

    IF (SELECT Count(*) FROM SomeTable) > 0

    and

    IF EXISTS (SELECT 1 FROM SomeTable)

    are treated the same, because SQL can tell that all you want it to check for presence/absence of a row.

    DECLARE @i int

    SET @i = (SELECT count(*) FROM SomeTable);

    IF @i >0

    AND

    IF EXISTS (SELECT 1 FROM SomeTable)

    are not treated the same way.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you! I wish I would have read this 5 years ago. This article validates lessons I have learned the hard way. I am now in the process of overcoming problems I created with the original db design and this article hit on MOST of them!

    So, what's a common best practice for dealing with older data that needs to get purged? I still need to get to it for comparative and historical purposes, but need to purge certain large tables (easily>2M Rows).

  • Loved the article. Wish I could be one of those nodding my head and saying, "yes, been there, done that, won't do it again." However, sadly I don't understand what the problem and the correct solutions in these cases...I need more help.

    I have worked with two major systems written by other developers. Usually more in a support/configuration/customization role. I think both of them have the OTLT that you describe.

    For the OTLT, one sytem had a code table with the primary key on the codeId, and codeType. Then there was one additional column - longName. There was a view created on each one of the codeTypes (I think this might have been done automatically) and any columns in the tables with names ending in Cd and matched a codeType. I think there were delete triggers on this table to search for any instances of a code before deleting them, but really the general rule was just to never delete a code. You would think very carefully before adding one and these were used for things that weren't likely to change overtime.

    Is this what you are describing as a time bomb? What is the big problem with this? What is the correct way to do handle these look ups? Hundreds of small individual lookup tables?

    Thank you very much for sharing your knowledge and experience!

  • Where's the 10 stars button for the vote??

  • Oh how well I recall the price for not archiving...:rolleyes:

    Excellent article!

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Samuel Vella (2/10/2010)


    Storage is cheap, CPU cycles wastes energy!

    I take it you've never compared the power usage of even a small SAN to that of a midsized database server... Storage may be cheap to purchase, but it's expensive to run. I agree with your overall position, but generalization like that is dangerous.

  • "TOP" versus "SELECT 1" performance varies depending on many factors . One of them is then "where" clause is in query and it is the most frequent usage of TOP and , as prev comment mentioned make sense with “Order By” clause.

    If TOP 1 clause used in following query :

    select top 1 * from table_with_10_mln_rows where Record_Date > ‘01/01/2008’

    The CPU time reported like this:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    The other option with select 1 from table_with_10_mln_rows where Record_Date > ‘01/01/2008’

    The CPU time reported like this:

    SQL SERVER EXECUTION TIMES:

    CPU TIME = 10 MS, ELAPSED TIME = 26 MS.

    However, the CPU execution changes all the way around when there is no “WHERE” clause.

    In my case it is because there is clustered index on Record_Date column in my table that I tested.

    My point for you is that when you doubt about what is the best query for you use the query opimiser options to check what is better in your case.

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    The full output after execution is:

    CPU TIME = 0 MS, ELAPSED TIME = 166 MS.

    SQL SERVER EXECUTION TIMES:

    CPU TIME = 0 MS, ELAPSED TIME = 0 MS.

    TABLE ‘MY_TABLE’. SCAN COUNT 1, LOGICAL READS 4, PHYSICAL READS 1, READ-AHEAD READS 0.

    SQL SERVER EXECUTION TIMES:

    CPU TIME = 10 MS, ELAPSED TIME = 26 MS.

    This line tells you about memory. Ideal execution then PHYSICAL READS is 0 and logical is minimum,

    TABLE ‘MY_TABLE’. SCAN COUNT 1, LOGICAL READS 4, PHYSICAL READS 1, READ-AHEAD READS 0.

    This line tells you the actual CPU time execution on the server, ideally is 0 when SQL server choose to neglect reporting the time because it is very small fraction of a second

    SQL SERVER EXECUTION TIMES:

    CPU TIME = 0 MS, ELAPSED TIME = 26 MS.

    Whichever query gives less CPU time and PHYSICAL READS wins the dispute of perfromance.

  • If you run a query twice in a row, the second will (very likely) have 0 physical reads because the data is cached.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • In regards to "The dreaded BIT field problem", what are your throughts on using an indexed filtered view instead of a table? IE

    CREATE VIEW DocumentList WITH SchemaBinding

    AS

    SELECT DocumentID

    FROM ProcessingQueue

    WHERE IsProcessed = 0 AND IgnoreForProcessing = 0

    CREATE CLUSTERED INDEX [DocumentID] ON [dbo].[DocumentList] (DocumentID ASC)

  • You got to clear the procedure cach every time if you do repeatable executions. Run DBCC FREEPROCCACHE when you want to clear all execution plans from the cach. There are more about on SQLMAG.COM by Kalen Delaney well known perfromance expert.

    But, for start it is a good for developers that start learning the query coding for performance.:-P However, if you just execute the querires in separte query window the first time will show you clean stats.

Viewing 15 posts - 16 through 30 (of 73 total)

You must be logged in to reply to this topic. Login to reply