Hidden Tricks To SQL Server Table Cleanup

Question

Hidden Tricks To SQL Server Table Cleanup

Viewing 15 posts - 16 through 30 (of 95 total)

You must be logged in to reply to this topic. Login to reply

jack 63767 SSC Enthusiast Points: 116 More actions · Answer 1

You could also select all the targeted EventIDs into a temp table (with a clustered index if you want to delete in order). That's important if you don't want to delete all types of rows. Maybe you need to care about referencial integrity and therefore you can't delete some rows or you need to filter the rows by some weird criteria first... 😉

Then, at every loop run, you could use a second temp table or a cte to select the top(x) EventIDs (from the first temp table) you want to delete, and put these into the IN clause of your DELETE statement.

This is more flexible (and the way we do it since years) as we have a highly normalized datamodel with big transaction tables, covering transactions of various types with other tables referencing on it. So we decided to archive and delete only the most frequent transaction types and only the special detail tables having dependencies on it.

PS: Don't forget to put indexes on tables referencing your "Event"-table, as SQL Server will check for referencing rows in these tables to maintain integrity during DELETE.

Peter Pirker Ten Centuries Points: 1079 More actions · Answer 2

Another way might be to partition the table - or index - switch out the desired partition(s) to a temp table and drop the temp table.

I'm doing this on 1BN + tables, and it's instantaneous.

Although you do have to plan for it when creating your table or index.

jack 63767 SSC Enthusiast Points: 116 More actions · Answer 3

Peter,

how do you deal with referencial integrity, or don't you have any FK-references to that table? That's the reason why we couldn't apply partitioning and chose to copy/delete old data.

Peter Pirker Ten Centuries Points: 1079 More actions · Answer 4

That's the benefit of ignorance: It never even crossed my mind there might be a problem, I just went ahead and did it 🙂

daryls Grasshopper Points: 13 More actions · Answer 5

This would also work:

SET ROWCOUNT 2000 -- this will limit the number of rows in the delete

DECLARE @Rowcount int

SET @Rowcount = -1 -- Initialise

WHILE @Rowcount <> 0

BEGIN

DELETE EventTracking.dbo.Event

WHERE EventCloseDate < (GETDATE() - 6) -- remove rows older than 6 days

SET @RowCount = @@ROWCOUNT

END

SET ROWCOUNT 0 --set it back

Lana Gold SSC Journeyman Points: 98 More actions · Answer 6

There is the SQL syntax error in the artcle:

CREATE VIEW [dbo].[VEventCleanup]

AS

(SELECT TOP(2000)

FROM EventTracking.dbo.Event

ORDER BY Eventid

)

The above code should be corrected as follows:

CREATE VIEW [dbo].[VEventCleanup]

AS

(SELECT TOP 2000 *

FROM EventTracking.dbo.Event

ORDER BY Eventid

)

Ellen-477471 SSCrazy Points: 2238 More actions · Answer 7

This seems a good discussion to clear up my understanding of using "select top xxxx ..."

I've always read the description to be that when an "order by" is not used then the data returned may or may not be consistent each time [in that case to say that the oldest is always returned would not be correct.] Does this depend on the existence of a clustered index? or is it determined by the primary key [clustered or not]?

and when an "order by" is used that for the sql server to determine the order all data from the table is looked at to get the "top xxx"; it seems like that could take extra time and use a lot of memory if it is a large table.

We have had multiple discussions about using "select top ..." , I contend that defining the criteria for the data you really want is better than depending on getting the correct top 100... Especially in cases where there is a delete or update involved, or the results have to be repeatable.

What is correct? What do you think?

RenoChris SSC Veteran Points: 260 More actions · Answer 8

I did something very similar which is working great. The commit after each small batch is the real secret. I also put a WAITFOR DELAY '00:00:02' into the loop which seems to help keep the purge stored procedure from encroaching on the overall server performance.

mauriciorpp Default port Points: 1472 More actions · Answer 9

This job is to be run mannualy right? because on the article a schedule for it was not created - but this is a minor thing. nice idea to use views to do the delete!

ScottPletcher SSC Guru Points: 100262 More actions · Answer 10

Like some others here, I'm not in favor of the "count" method of doing these DELETEs. In particular, if the process has to be stopped and then resumed later, you have to scan the entire table again to get the count to use in the processing.

Aside from that, there's definitely a potential for locking issues with this statement from the original code:

SELECT @cnt = COUNT(*)

FROM [EventTracking].[DBO].[Event]

WHERE EventCloseDate < GETDATE() - 6 -- remove rows older than 6 days

since the entire table will have to be scanned. OK, a table scan may be unavoidable, since the only index is on id, and we need to check the date column as well, but at least we can use NOLOCK. Since we're only concerned with obsolete rows, that shouldn't be an issue.

Since the id is the clus key, I suggest just getting the max id that needs deleted. Also, personally I would strip the time from the date, so that mutliple runs in the same day would all delete the same rows, rather than based on the time the command happened to run.

Finally, I would use a variable for the days to keep, so that the code is self-documenting rather than relying on textual comments.

So, putting it all together, something like this perhaps:

DECLARE @days_of_history_to_keep int

SET @days_of_history_to_keep = 6

DECLARE @EventId bigint

IF OBJECT_ID('EventTracking.dbo.Event_Delete_Control') IS NOT NULL

DROP TABLE EventTracking.dbo.Event_Delete_Control

SELECT @EventId = MAX(EventId)

FROM [EventTracking].[DBO].[Event] WITH (NOLOCK)

WHERE

EventCloseDate < DATEADD(DAY, -@days_of_history_to_keep, DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0))

-- store the controlling values in case this process has to be restarted later

SELECT @EventId AS EventId

INTO EventTracking.dbo.Event_Delete_Control

...

--the DELETE statement inside the DELETE loop (naturally other code will enclose this)

...

DELETE TOP (nnnn)

FROM [EventTracking].[DBO].[Event]

WHERE

EventId <= @EventId

...

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Edward.Polley 76944 SSC Eights! Points: 834 More actions · Answer 11

Edward.Polley 76944

SSC Eights!

Points: 834

March 6, 2013 at 11:03 am

#1594082

Thanks Lana - somehow missed that in editing.:-)

Edward.Polley 76944 SSC Eights! Points: 834 More actions · Answer 12

This technique runs on our production systems for over 1 year - it reliably returns the oldest rows as long as the predicate is against the primary cluster index key. That's part of the trick here, give it a try as described in the article and let me know your results.

Edward.Polley 76944 SSC Eights! Points: 834 More actions · Answer 13

thanks for the comment. I also played with putting in a wait delay and that is another approach. What I do now is run these deletes as jobs that run every x minutes each job deleting a set number of rows.

Edward.Polley 76944 SSC Eights! Points: 834 More actions · Answer 14

Thanks for the comment. Sorry these jobs runs on a schedule, initially against huge tables I run them every 20 minutes once the table is cleaned up I change the schedule to once a day. Prior to this technique I could not run table cleanup during the business day due to lock contention but now I run these anytime of day with very minimal lock contention.

Edward.Polley 76944 SSC Eights! Points: 834 More actions · Answer 15

Agreed the count method consumes resources so initially against huge tables I hard code the count. Say the table has 50 million rows to remove, no point using count just SET @count = 1000000. However this is production so when we get to a few million rows to remove the table is small enough where the count penalty is small. Thanks for the comment.