Deleting large amount of data

  • Hi Everyone,

    I have a rather large table (130,000,000 rows), and I want to get it down to a manageable 1,000,000 rows, by removing old data based on a date field.  What is the best way to do this?  I don't want to use DELETE because I'm sure this will fill up the transaction log.

    In DB2, we would do a REORG WITH DISCARD so there is no log activity.  Does SQL Server have an equivalent?

    Thanks.

     

  • With that number of rows that you want to delete and the few that you need to keep, I'd suggest that you copy the rows you do want to keep to a temp table, truncate the massive table (logged as page deallogations, not row deletes) then copy the saved data back.

    HTH

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Another trick is to delete a section.  Say you want to delete all records before Jan 1, 2006.  Create a loop

    declare @date datetime

    set @Date = '1/1/1960'

    while @Date <= '1/1/2006'

     begin

       delete MyTable where RefDate < @Date

      set @Date = dateadd (day, 1, @Date)

     end

    Russel Loski, MCSE Business Intelligence, Data Platform

  • I would go for Gail's advice.

    Before doing so, change your database recovery model to bulk-logged. Use bcp to copy the 1M rows, truncate your table, then use BULK INSERT to bring back the 1M rows.


    Ronald San Juan | SQL DBA
    ID 710124 ~ Code "Northwind"

  • I tend to agree with Gail. Here's a couple of modification and potential 'gotchas'. We have to start out with the "gotchas" first because they may alter your choice of actions ...'

    • Are there triggers on the table - delete specifically ?
    • Are there foreign key constraints ?
    • Is there an identity column in the table ?

    Now a fsomewhat generic plan with a few modified steps:

    • Script out the existing table and all dependencies
    • Do a select into from the existing 130,000,000 row original_table into a new_ table with a where clause of ONLY 1=2 (this gives you the structure only). Don't forget to drop all indexes but the primary key and it's associated index.
    • Create an insert_into_view to get the 1,000,000 rows you want to keep from the original_table
    • Perform a full database backup
    • Set recovery to bulk-logged ('simple' recovery is better if you have downtime)
    • Perform an insert into new_table from the insert_into_view  just created (insert into is easier on logging than insert into)
    • Now you may have to disable conatraints and possibly drop triggers ... depending ...
    • execute an sp_rename of original_table to old_table
    • execute an sp_rename new_table to original_table
    • Now you may have to alter original_table to add back triggers removed earlier
    • Next you may have to recreate indexes removed earlier on original_table
    • Set you database recovery mode back to 'full'
    • Perform a full database backup
    • Perform DBCC UPDATEUSAGE for the database
    • Perform UPDATE STATISTICS for original_table
    • Execute sp_recompile for original_table
    • Execute sp_refreshview for original_table
    • The only potential painful gotcha concerns identity columns - I'll leave that up to you.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply