Updating 20M rows takes 5 hours

Question

Post reply

Updating 20M rows takes 5 hours

ss-457805

SSCertifiable

Points: 5872
More actions
May 17, 2010 at 2:27 am

#219012

Dear Friends
I need to update 20 Million rows as part of Migration. The script is:
update productionaudit set
assetname = a.assetname,
assetdesc = a.description,
assetruncostperhour = a.runcostperhour,
assettype = a.assettype
from productionaudit p, asset a where p.assetid = a.assetid
This takes about 5 hours to run. Is there a way i can update 20M rows which will be quicker.
Please also find attached Table definations.
Many thanks in advance.
blog: http://sarveshsingh.com Twitter: @sarveshsing

Viewing 15 posts - 1 through 15 (of 42 total)

You must be logged in to reply to this topic. Login to reply

Stefan_G SSCertifiable Points: 6609 More actions · Answer 1

You could try this:

* Disable all indexes on productionaudit before performing the update, then reenable them after the update.

* Dont update all 20M rows in one single transaction - use several smaller transactions instead. You could split all the entries based on ranges of assetid for example.

* If you are updating a large part of the table, you could create a new table with the correct content using SELECT INTO, then drop the old table, rename the new table, recreate all indexes and constraints.

/SG

Grant Fritchey SSC Guru Points: 397721 More actions · Answer 2

Since you're doing updates, you do have to worry about two searches there, the search within the table for the correct records to update and the select statement & join between that table & the other. I'd examine the execution plan. I suspect one or all of these is coming up with a table scan. You might need some different indexing?

Have you checked for contention?

----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 3

I've used Stefan's suggestions before successfully. Batch this into smaller transactions, say 10k at a time and it will probably go quicker.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

Every system has a "tipping point" and it's rather unpredictable. Try updating just a million rows... it'll probably take something much less than a minute. Then try two million rows. That will probably take only twice as long. Then, try 3 million rows... my machine at home has a tipping point somewhere between 2 and 3 million rows for most large tables and instead of it only taking 3 times longer than a million rows (which should be much less than 3 minutes), the update now takes more than 20 minutes.

As already pointed out, the "batch" method Stefen pointed out is the way to beat this problem but I don't believe you need to knock it down to only 10K rows.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ss-457805 SSCertifiable Points: 5872 More actions · Answer 5

Thank you guys for your responses.

I tried about 2.5 Mill rows in my test bed with the following method as suggested by stefan in the above post:

1. used select * into productionaudit_temp from productionaudit where auditid >=1 and auditid<=2500000

2. updated the temp table

This took roughly 2 minutes.

When i run the update on productionaudit table where all the indexes are, it takes about 20 minutes.

So why is the update on a temp_table where there's no index takes 2 minutes and the one with the indexes takes 20 minutes. This only means that the indexes on the tables are wrong? is my assumption correct? or it needs different index for updates!

I suppose i can drop the indexes, run the update 2Million rows at a time, and then recreate the indexes. But is this the correct way to do it.

blog: http://sarveshsingh.com Twitter: @sarveshsing

rjv_rnjn Hall of Fame Points: 3023 More actions · Answer 6

I would guess one of the columns in your update statements are being used as part of an index. If so, this forces the index to be re-built after the update statement. And that is why one part of the suggested solution by @stefan was to drop the indexes, do the update and rebuild them.

Rebuilding the indexes do not loose any information (I guess, there might be something I as a developer do not use but a DBA may) and will not take as much time when created as they are slowing your update.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

ss-457805 (5/17/2010)
Thank you guys for your responses.
I tried about 2.5 Mill rows in my test bed with the following method as suggested by stefan in the above post:
1. used select * into productionaudit_temp from productionaudit where auditid >=1 and auditid<=2500000
2. updated the temp table
This took roughly 2 minutes.
When i run the update on productionaudit table where all the indexes are, it takes about 20 minutes.
So why is the update on a temp_table where there's no index takes 2 minutes and the one with the indexes takes 20 minutes. This only means that the indexes on the tables are wrong? is my assumption correct? or it needs different index for updates!
I suppose i can drop the indexes, run the update 2Million rows at a time, and then recreate the indexes. But is this the correct way to do it.

The table in TEMPDB is in the simple mode and had no indexes. The table in the production database has indexes that are probably not in the same order as you updates and are suffering huge extent splits. That cause a huge amount of unnecessary reads and even if most of them ar logical reads, guess what that's going to do to performance? You also have all of that being captured and kept in the log.

You'll need to disable the indexes to get any speed out of this. Either that or just let it "crawl" through because you've probably already spent more time trying to figure out what's going on than if you just let it run. Hopefully, this is a one off, right?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ss-457805 SSCertifiable Points: 5872 More actions · Answer 8

Thanks Jeff for the explanation,

We do this every 3-4 months for the customers. So would like to save time and do it properly. The developer who did the work has left the company.

Please see attached the execution plan.

PLease let me know if adding proper indexes can resolve this.

blog: http://sarveshsingh.com Twitter: @sarveshsing

Stefan_G SSCertifiable Points: 6609 More actions · Answer 9

ss-457805 (5/17/2010)
I suppose i can drop the indexes, run the update 2Million rows at a time, and then recreate the indexes. But is this the correct way to do it.

Yes, this is the way to do it if you want maximum performance.

Note that you dont really have to drop and recreate the indexes. It is enough to disable the indexes and then rebuild them after the update.

Disable and rebuild can be easier to script because you do not have to repeat the index definition when you rebuild the indexes.

/SG

Stefan_G SSCertifiable Points: 6609 More actions · Answer 10

ss-457805 (5/18/2010)
PLease let me know if adding proper indexes can resolve this.

Adding more indexes will not solve the problem.

ss-457805 SSCertifiable Points: 5872 More actions · Answer 11

Hi Stefan

I just diabled all the non clustered indexes and ran the update. It took 8 minutes. It is still more than the one i tested using Select * into... which did not have any indexes at all.

I can't disable clustered index because then table won't be available.

Am i missing something?

blog: http://sarveshsingh.com Twitter: @sarveshsing

Stefan_G SSCertifiable Points: 6609 More actions · Answer 12

ss-457805 (5/18/2010)
Hi Stefan
I just diabled all the non clustered indexes and ran the update. It took 8 minutes. It is still more than the one i tested using Select * into... which did not have any indexes at all.
I can't disable clustered index because then table won't be available.
Am i missing something?

No this sounds reasonable.

SELECT INTO is the fastest way to create a table in SQL server.

And it is correct that you should only disable the nonclustered indexes.

How many rows did you update in 8 minutes ?

How many total rows do you have in the productionaudit table ?

ss-457805 SSCertifiable Points: 5872 More actions · Answer 13

Updated 2.6 Million rows.

Total number of rows in productionaudit is 2.6 million. This is the test environment. In the live production the number of rows is >20Millions

blog: http://sarveshsingh.com Twitter: @sarveshsing

Stefan_G SSCertifiable Points: 6609 More actions · Answer 14

ss-457805 (5/18/2010)
Updated 2.6 Million rows.
Total number of rows in productionaudit is 2.6 million. This is the test environment. In the live production the number of rows is >20Millions

So, if I understand you correctly you are updating all rows in this table.

In that case you can use the third method from my original post. It is probably the most efficient in your situation.

1) Use SELECT INTO to create a new table with the correct content by joining the two involved tables.

2) Drop the old table

3) rename the new table to the old name

4) Recreate all indexes and constraints

Something like the script below:

Please make sure to review the script thoroughly before running it since it will delete your original table

Also make sure that the table creating part succeeds before continuing with the rest of the script.

-- Create the new table

SELECT p.auditId ,

p.vFrom ,

p.vTo ,

p.assetId ,

p.availability ,

p.status ,

p.opMode ,

p.qtyIn ,

p.qtyOut ,

p.qtyProcessed ,

p.qtyRejected ,

p.countUnitId ,

p.rate ,

p.shiftId ,

p.runId ,

p.productId ,

p.crewId ,

p.crewSize ,

p.stopEventRefId ,

p.rejectEventRefId ,

p.xnCode ,

p.version ,

p.shiftAuditId ,

p.cellAssetId ,

--p.assetname ,

--p.assetdesc ,

--p.assetRunCostPerHour ,

--p.assettype ,

a.assetname ,

a.description AS assetdesc ,

a.runCostPerHour AS assetRunCostPerHour,

a.assettype ,

p.countUnitDesc ,

p.shiftName ,

p.shiftDesc ,

p.runname ,

p.productName ,

p.productDesc ,

p.productCountUnitId ,

p.productCountUnitDesc ,

p.materialCost ,

p.crewName ,

p.crewCostPerHourPerHead ,

p.cellAssetName ,

p.cellAssetDesc ,

p.lastStatusChangeAuditId ,

INTOdbo.New

FROM dbo.productionAudit p

JOIN dbo.asset a ON p.assetId = a.assetId

-- Drop the old table

DROP TABLE dbo.productionAudit

-- Rename the new table to the old name

EXEC sp_rename 'dbo.New', 'dbo.productionAudit'

-- Recreate all indexes

-- Please verify that this script is correct! It is based on the table definition posted on 2010-05-17

SET ANSI_PADDING OFF

GO

ALTER TABLE dbo.productionAudit ADD CONSTRAINT [pk_productionAudit] PRIMARY KEY CLUSTERED

(

[auditId] ASC

)

SET ANSI_PADDING OFF

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit] ON [dbo].[productionAudit]

(

[assetId] ASC,

[vFrom] ASC,

[vTo] ASC,

[xnCode] ASC

)

INCLUDE ( [auditId],

[availability],

[status],

[opMode],

[qtyIn],

[qtyOut],

[qtyProcessed],

[qtyRejected],

[countUnitId],

[rate],

[shiftId],

[runId],

[productId],

[crewId],

[crewSize],

[stopEventRefId],

[rejectEventRefId],

[version],

[shiftAuditId]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_assetCounts] ON [dbo].[productionAudit]

(

[assetId] ASC,

[availability] ASC,

[status] ASC,

[vFrom] ASC,

[vTo] ASC

)

INCLUDE ( [qtyIn],

[qtyOut],

[qtyRejected],

[qtyProcessed]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_assetTime] ON [dbo].[productionAudit]

(

[assetId] ASC,

[assetname] ASC,

[availability] ASC,

[status] ASC,

[vFrom] ASC,

[vTo] ASC

)

INCLUDE ( [rate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_findCountRows] ON [dbo].[productionAudit]

(

[vFrom] ASC,

[vTo] ASC,

[cellAssetId] ASC,

[xnCode] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_productionRun] ON [dbo].[productionAudit]

(

[runId] ASC,

[assetId] ASC,

[vFrom] ASC,

[vTo] ASC,

[availability] ASC,

[status] ASC

)

INCLUDE ( [rate],

[qtyIn],

[qtyOut],

[qtyRejected],

[qtyProcessed]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_rejects] ON [dbo].[productionAudit]

(

[shiftAuditId] ASC,

[qtyRejected] ASC,

[auditId] ASC,

[countUnitId] ASC,

[rejectEventRefId] ASC,

[vFrom] ASC,

[assetId] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_shift] ON [dbo].[productionAudit]

(

[assetId] ASC,

[productId] ASC,

[auditId] ASC,

[shiftAuditId] ASC,

[availability] ASC,

[status] ASC

)

INCLUDE ( [rate],

[qtyIn],

[qtyOut],

[qtyRejected]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_shiftAndRun] ON [dbo].[productionAudit]

(

[shiftAuditId] ASC,

[runId] ASC,

[assetId] ASC

)

INCLUDE ( [rate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_stoppages] ON [dbo].[productionAudit]

(

[runId] ASC,

[stopEventRefId] ASC,

[availability] ASC,

[status] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [ind_productionAudit_vTo] ON [dbo].[productionAudit]

(

[vTo] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO