null or not exists

  • I need to re-write this query for better performance as the table dbo.tblDataLogtraceData contains 17385758 records and is taking 45+seconds to run.

    Would a not exists be the better option?

    SELECTrdt.DataLoggerTraceID

    , rdt.ReadingDateTime

    , rdt.Tonnes

    , rdt.TonneRate

    , @loadKey

    FROMdbo.tblDataLogTraceRawData as rd

    RIGHT OUTER JOIN dbo.tblDataLogTraceRawDataTemp as rdt ON rd.ReadingDateTime = rdt.ReadingDateTime

    AND rd.DataLoggerTraceID = rdt.DataLoggerTraceID

    WHERE rd.DataLoggerTraceID IS NULL

  • It might be. Hard to say for sure. Give it a try and see.

    What indexes do you have on those tables? How many rows does the query return?

    Can you post the excution plan? (saved as a .sqlplan file, zipped and attached to the thread)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • How many rows are being output by the query? Also, are you outputing the query to the screen? If so, why?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • ok, i have re-written the proc as follows.

    SELECTrdt.DataLoggerTraceID

    , rdt.ReadingDateTime

    , rdt.Tonnes

    , rdt.TonneRate

    FROMdbo.tblDataLogTraceRawDataTemp as rdt

    WHERE NOT EXISTS(

    SELECT 'y'

    FROM dbo.tblDataLogTraceRawData rd

    WHERE rd.ReadingDateTime = rdt.ReadingDateTime

    AND rd.DataLoggerTraceID = rdt.DataLoggerTraceID

    )

    It is still taking longer than i want to process. How do i attach an sql exec plan for all you experts to review?

  • For such simple SQL, as the execution plan will be very short, this is the simpliest method - using SQL Server Management Studio, open a query window and paste in the SQL statment. Before your SQL have this statement:

    set showplan_text on

    go

    Then set output to text format by pressing CTRL-T, and finally, execute the SQL by pressing F5

    Cut the output and paste into a forum reply.

    Also run the below SQL to get the date that data value statistics were last run:

    SELECT o.nameas TableNAme

    ,i.name as IndexName

    ,STATS_DATE(i.object_id, i.index_id) as StatsDate

    FROM sys.objects o

    JOIN sys.indexes i

    ON o.object_id = i.object_id

    whereo.name in ('tblDataLogTraceRawData','tblDataLogTraceRawDataTemp')

    SQL = Scarcely Qualifies as a Language

  • last stats updated on

    TableNAmeIndexNameStatsDate

    tblDataLogTraceRawDataPK_tblDataLoggerTrace2008-05-09 17:29:54.317

    tblDataLogTraceRawDataTempPK_tblDataLoggerTraceTemp2008-05-12 16:39:55.070

    Execution plan

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    |--Merge Join(Left Anti Semi Join, MERGE: ([rdt].[DataLoggerTraceID], [rdt].[ReadingDateTime])=([rd].[DataLoggerTraceID], [rd].[ReadingDateTime]), RESIDUAL: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawData].[ReadingDateTime] as [rd].[ReadingDateTime]=[dbQRSPer

    |--Clustered Index Scan (OBJECT: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawDataTemp].[PK_tblDataLoggerTraceTemp] AS [rdt]), ORDERED FORWARD)

    |--Clustered Index Scan (OBJECT: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawData].[PK_tblDataLoggerTrace] AS [rd]), ORDERED FORWARD)

    (3 row(s) affected)

  • Jeff Moden (5/12/2008)


    How many rows are being output by the query? Also, are you outputing the query to the screen? If so, why?

    Still haven't answered the question above...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Paul Fechner (5/12/2008)


    How do i attach an sql exec plan for all you experts to review?

    As I said above

    (saved as a .sqlplan file, zipped and attached to the thread)

    If you're using SQL 2005, that is. I assume you are, seeing as you're posing in a SQL 2005 forum.

    If not, run the query with showplan as Carl suggested, output to grid and copy the entire result into excel, save, zip and attach. There's a lot of necessary info in the other columns of the exec plan.

    Please also post the following:

    The table schema

    The index definitions

    The number of rows affected by the query.

    Without all of those it's going to be near impossible to offer meaningful suggestions.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • How many rows are in the temp table?

    How many average duplicates are there of each ID, roughly, in each table? That is, what is the value of:

    SELECT COUNT(DISTINCT ID) * 100.0 / COUNT(*) AS [% Dupl Ids]

    FROM tablename

    How many average duplicates are there of each ID and datetime, roughly, in the 17M row table?

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • All in All, I do not see anything significantly wrong with the SQL or the plan. For a dbo.tblDataLogtraceData contains 17,385,758 rows, 45+seconds is not unreasonable.

    From your post on 5-12 and reformatting the information provided:

    Last stats updated on:

    TableNAme IndexName StatsDate

    tblDataLogTraceRawData PK_tblDataLoggerTrace 2008-05-09 17:29:54.317

    tblDataLogTraceRawDataTemp PK_tblDataLoggerTraceTemp 2008-05-12 16:39:55.070

    The statistics on tblDataLogTraceRawDataTemp are up-to date

    and on tblDataLogTraceRawData are four days old, so the stats will only be inaccurate

    if a significant change in the # of rows ( more than 10% ). Let us know if the data has significantly changed.

    Execution plan

    |--Merge Join(Left Anti Semi Join,

    MERGE: ([rdt].[DataLoggerTraceID], [rdt].[ReadingDateTime])

    = ([rd].[DataLoggerTraceID], [rd].[ReadingDateTime])

    , RESIDUAL: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawData].[ReadingDateTime] as [rd].[ReadingDateTime]

    =[dbQRSPer

    |--Clustered Index Scan (OBJECT: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawDataTemp].[PK_tblDataLoggerTraceTemp] AS [rdt])

    , ORDERED FORWARD)

    |--Clustered Index Scan (OBJECT: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawData].[PK_tblDataLoggerTrace] AS [rd])

    , ORDERED FORWARD)

    Based on the plan, both tables have a clustered index on ( DataLoggerTraceID, ReadingDateTime). Are both indexes unique ?

    The datatype of the columns be compared are the same as there are no "convert" functions shown in the plan.

    Take a look at "Understanding Merge Joins" at http://technet.microsoft.com/en-us/library/ms190967(SQL.100).aspx

    Some more research could be done:

    1. Fragmentation can cause poor performance and the degree can be retrieved using the system function sys.dm_db_index_physical_stats

    SELECT * FROM sys.dm_db_index_physical_stats

    (DB_ID(), OBJECT_ID(N'tblDataLogTraceRawData'), NULL, NULL , 'DETAILED');

    SELECT * FROM sys.dm_db_index_physical_stats

    (DB_ID(), OBJECT_ID(N'tblDataLogTraceRawDataTemp'), NULL, NULL , 'DETAILED');

    2. Significant data changes that make the statistics out of date and will generate a plan against the old data, so run update statistics.

    Configuration changes:

    1. The plan is not using parallelism, which would reduce run time but would increase cpu time and increase disk i/o. Is parallelism enables ? ( sp_configure 'max degree of parallelism' show either 0, meaning all available cpus) or a number greater than 1, specifying the max # of cpus that can be used.

    2. Is partitioning an option ?

    3. Is there a disk, memory or network resource shortage ? If the results are being displayed on a different computer, then perhaps the problem is in the network.

    SQL = Scarcely Qualifies as a Language

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply