null or not exists

Question

Post reply

null or not exists

fec

SSC Eights!

Points: 941
More actions
May 12, 2008 at 12:44 am

#190413

I need to re-write this query for better performance as the table dbo.tblDataLogtraceData contains 17385758 records and is taking 45+seconds to run.
Would a not exists be the better option?
^{SELECTrdt.DataLoggerTraceID}
, rdt.ReadingDateTime
, rdt.Tonnes
, rdt.TonneRate
, @loadKey
FROMdbo.tblDataLogTraceRawData as rd
RIGHT OUTER JOIN dbo.tblDataLogTraceRawDataTemp as rdt ON rd.ReadingDateTime = rdt.ReadingDateTime
AND rd.DataLoggerTraceID = rdt.DataLoggerTraceID
WHERE rd.DataLoggerTraceID IS NULL

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 1

It might be. Hard to say for sure. Give it a try and see.

What indexes do you have on those tables? How many rows does the query return?

Can you post the excution plan? (saved as a .sqlplan file, zipped and attached to the thread)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

How many rows are being output by the query? Also, are you outputing the query to the screen? If so, why?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

fec SSC Eights! Points: 941 More actions · Answer 3

ok, i have re-written the proc as follows.

^{SELECTrdt.DataLoggerTraceID}

, rdt.ReadingDateTime

, rdt.Tonnes

, rdt.TonneRate

FROMdbo.tblDataLogTraceRawDataTemp as rdt

WHERE NOT EXISTS(

SELECT 'y'

FROM dbo.tblDataLogTraceRawData rd

WHERE rd.ReadingDateTime = rdt.ReadingDateTime

AND rd.DataLoggerTraceID = rdt.DataLoggerTraceID

)

It is still taking longer than i want to process. How do i attach an sql exec plan for all you experts to review?

Carl Federl One Orange Chip Points: 25384 More actions · Answer 4

For such simple SQL, as the execution plan will be very short, this is the simpliest method - using SQL Server Management Studio, open a query window and paste in the SQL statment. Before your SQL have this statement:

set showplan_text on

go

Then set output to text format by pressing CTRL-T, and finally, execute the SQL by pressing F5

Cut the output and paste into a forum reply.

Also run the below SQL to get the date that data value statistics were last run:

SELECT o.nameas TableNAme

,i.name as IndexName

,STATS_DATE(i.object_id, i.index_id) as StatsDate

FROM sys.objects o

JOIN sys.indexes i

ON o.object_id = i.object_id

whereo.name in ('tblDataLogTraceRawData','tblDataLogTraceRawDataTemp')

SQL = Scarcely Qualifies as a Language

fec SSC Eights! Points: 941 More actions · Answer 5

last stats updated on

^{TableNAmeIndexNameStatsDate}

tblDataLogTraceRawDataPK_tblDataLoggerTrace2008-05-09 17:29:54.317

tblDataLogTraceRawDataTempPK_tblDataLoggerTraceTemp2008-05-12 16:39:55.070

Execution plan

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

|--Merge Join(Left Anti Semi Join, MERGE: ([rdt].[DataLoggerTraceID], [rdt].[ReadingDateTime])=([rd].[DataLoggerTraceID], [rd].[ReadingDateTime]), RESIDUAL: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawData].[ReadingDateTime] as [rd].[ReadingDateTime]=[dbQRSPer

|--Clustered Index Scan (OBJECT: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawDataTemp].[PK_tblDataLoggerTraceTemp] AS [rdt]), ORDERED FORWARD)

|--Clustered Index Scan (OBJECT: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawData].[PK_tblDataLoggerTrace] AS [rd]), ORDERED FORWARD)

(3 row(s) affected)

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

Jeff Moden (5/12/2008)
How many rows are being output by the query? Also, are you outputing the query to the screen? If so, why?

Still haven't answered the question above...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 7

Paul Fechner (5/12/2008)
How do i attach an sql exec plan for all you experts to review?

As I said above

(saved as a .sqlplan file, zipped and attached to the thread)

If you're using SQL 2005, that is. I assume you are, seeing as you're posing in a SQL 2005 forum.

If not, run the query with showplan as Carl suggested, output to grid and copy the entire result into excel, save, zip and attach. There's a lot of necessary info in the other columns of the exec plan.

Please also post the following:

The table schema

The index definitions

The number of rows affected by the query.

Without all of those it's going to be near impossible to offer meaningful suggestions.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ScottPletcher SSC Guru Points: 100262 More actions · Answer 8

How many rows are in the temp table?

How many average duplicates are there of each ID, roughly, in each table? That is, what is the value of:

SELECT COUNT(DISTINCT ID) * 100.0 / COUNT(*) AS [% Dupl Ids]

FROM tablename

How many average duplicates are there of each ID and datetime, roughly, in the 17M row table?

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Carl Federl One Orange Chip Points: 25384 More actions · Answer 9

All in All, I do not see anything significantly wrong with the SQL or the plan. For a dbo.tblDataLogtraceData contains 17,385,758 rows, 45+seconds is not unreasonable.

From your post on 5-12 and reformatting the information provided:

Last stats updated on:

TableNAme IndexName StatsDate

tblDataLogTraceRawData PK_tblDataLoggerTrace 2008-05-09 17:29:54.317

tblDataLogTraceRawDataTemp PK_tblDataLoggerTraceTemp 2008-05-12 16:39:55.070

The statistics on tblDataLogTraceRawDataTemp are up-to date

and on tblDataLogTraceRawData are four days old, so the stats will only be inaccurate

if a significant change in the # of rows ( more than 10% ). Let us know if the data has significantly changed.

Execution plan

|--Merge Join(Left Anti Semi Join,

MERGE: ([rdt].[DataLoggerTraceID], [rdt].[ReadingDateTime])

= ([rd].[DataLoggerTraceID], [rd].[ReadingDateTime])

, RESIDUAL: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawData].[ReadingDateTime] as [rd].[ReadingDateTime]

=[dbQRSPer

|--Clustered Index Scan (OBJECT: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawDataTemp].[PK_tblDataLoggerTraceTemp] AS [rdt])

, ORDERED FORWARD)

|--Clustered Index Scan (OBJECT: ([dbQRSPeriod].[dbo].[tblDataLogTraceRawData].[PK_tblDataLoggerTrace] AS [rd])

, ORDERED FORWARD)

Based on the plan, both tables have a clustered index on ( DataLoggerTraceID, ReadingDateTime). Are both indexes unique ?

The datatype of the columns be compared are the same as there are no "convert" functions shown in the plan.

Take a look at "Understanding Merge Joins" at http://technet.microsoft.com/en-us/library/ms190967(SQL.100).aspx

Some more research could be done:

1. Fragmentation can cause poor performance and the degree can be retrieved using the system function sys.dm_db_index_physical_stats

SELECT * FROM sys.dm_db_index_physical_stats

(DB_ID(), OBJECT_ID(N'tblDataLogTraceRawData'), NULL, NULL , 'DETAILED');

SELECT * FROM sys.dm_db_index_physical_stats

(DB_ID(), OBJECT_ID(N'tblDataLogTraceRawDataTemp'), NULL, NULL , 'DETAILED');

2. Significant data changes that make the statistics out of date and will generate a plan against the old data, so run update statistics.

Configuration changes:

1. The plan is not using parallelism, which would reduce run time but would increase cpu time and increase disk i/o. Is parallelism enables ? ( sp_configure 'max degree of parallelism' show either 0, meaning all available cpus) or a number greater than 1, specifying the max # of cpus that can be used.

2. Is partitioning an option ?

3. Is there a disk, memory or network resource shortage ? If the results are being displayed on a different computer, then perhaps the problem is in the network.

SQL = Scarcely Qualifies as a Language