deletion of 2 million records

  • I want to delete 2 million records from my table(based upon a condition)....what is the easiest way to do???

  • Well, the EASIEST way is to simply write a DELETE statment that does what you want. 😉

    Since you are asking here I assume you are really asking about the MOST EFFICIENT way to delete 2 million rows.

    There are multiple different methods depending on the size of the table.

    If you are deleting most of the rows, I recommend creating a new table with only the rows you want to keep (using SELECT INTO), and then drop the old table and rename the new table to the same name as the old. Note that this will require extra care if there are any constraints or indexes on the table.

    If the table is very big (more than 10 million rows) you can delete the data in several passes. Use SET ROWCOUNT to limit the amount of data deleted by each DELETE statement and run it several times.

  • sandip.vs (3/18/2010)


    I want to delete 2 million records from my table(based upon a condition)....what is the easiest way to do???

    Assuming that you read Stefan's post, and you need to perform a delete of these records (vs. copy into a new table as Stefan suggests), then this falls into one of those very few times where doing something in a loop is actually desirable... otherwise you'll end up with a table lock while deleting those records all at once.

    do it something like this:

    SET ROWCOUNT 10000

    while exists (select 1 from MyTable where condition = true)

    delete from MyTable where condition = true

    This will delete 10,000 records at a time until all 2 million have been deleted.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Stefan_G (3/18/2010)


    If the table is very big (more than 10 million rows) you can delete the data in several passes. Use SET ROWCOUNT to limit the amount of data deleted by each DELETE statement and run it several times.

    Much better to use TOP. The use of ROWCOUNT with anything other than SELECT is deprecated and will be removed from the next version of SQL Server.

    Books Online - SET ROWCOUNT


    Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Do not use SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax.

    Also see Limiting Result Sets By Using TOP and PERCENT for additional reasons to avoid SET ROWCOUNT with data modification statements.

    Be careful to take account of locking thresholds when deleting in batches. Lynn Pettis wrote a good article on exactly this process, see Deleting a Large Number of Records[/url]

    Paul

  • Paul White NZ (3/19/2010)


    Much better to use TOP. The use of ROWCOUNT with anything other than SELECT is deprecated and will be removed from the next version of SQL Server.

    Nice. I started using this technique with SQL 2000 and I had not noticed that it was deprecated.

    Thank you for that info.

    /SG

  • Stefan_G (3/19/2010)


    Nice. I started using this technique with SQL 2000 and I had not noticed that it was deprecated. Thank you for that info.

    You are welcome. It comes as a surprise to many! Lynn's article is worth a read too - even for those very familiar with the basic method.

  • WayneS (3/18/2010)


    sandip.vs (3/18/2010)


    I want to delete 2 million records from my table(based upon a condition)....what is the easiest way to do???

    Assuming that you read Stefan's post, and you need to perform a delete of these records (vs. copy into a new table as Stefan suggests), then this falls into one of those very few times where doing something in a loop is actually desirable... otherwise you'll end up with a table lock while deleting those records all at once.

    do it something like this:

    SET ROWCOUNT 10000

    while exists (select 1 from MyTable where condition = true)

    delete from MyTable where condition = true

    This will delete 10,000 records at a time until all 2 million have been deleted.

    Whether you use TOP or SET ROWCOUNT, this will still keep the table pretty busy especially when it gets to the area of the table that people are using the most. My recommendation is that you add a WAITFOR DELAY of at least a second or two to let other processes "in".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • and dont forget to set SET @@ROWCOUNT 0 after deletion

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • I found it might be better to use TOP syntax in the end anyway. The ROWCOUNT setting appears to be not taken into consideration by the query optimizer and this may affect execution plans for the queries.

    I am not sure if it is OK to post link here to my blog, but I wrote a post about this some time ago.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • Bhuvnesh (3/19/2010)


    and dont forget to set **** after deletion

    No, no, no! Please read my prior post on why not to use SET ROWCOUNT for this!

    The command is SET ROWCOUNT anyway, not SET @@ROWCOUNT...:crazy:

  • Piotr.Rodak (3/19/2010)


    The ROWCOUNT setting appears to be not taken into consideration by the query optimizer and this may affect execution plans for the queries.

    Yes it mentions that in the second of the links I posted earlier. It's a bit more complex than it seems though - you might have noticed extra TOP operators in query plans with "IsRowCount = True". Craig Freedman covers it briefly here.

    The other thing I dislike about SET ROWCOUNT is that its value stays in effect for any triggers fired by the operation. Unless the trigger was coded with an explicit SET ROWCOUNT 0, the results can be unfortunate.

  • Thanks Paul, I didn't see this post before. I really enjoy Craig's posts about execution plans.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • @waynes - you're saying if I do the deletion in a loop there will be no locks on the table? Is this better than specifying 'no lock' in the query?

  • ncodner (3/19/2010)


    @WayneS - you're saying if I do the deletion in a loop there will be no locks on the table? Is this better than specifying 'no lock' in the query?

    I would encourage you to read Lynn's article referenced earlier in the thread.

  • First, I want to thank Paul for referencing my article and the nice comments regarding it as well.

    Second, Jeff makes a good point as well if this is a heavily used table. If you look at the code I wrote in my article, you will notice that if the database is not using the simple recovery model, I have inserted a transaction log backup during the delete process. This was done to help control the size of the databases' transaction log. You could use that same logic to include a WAITFOR DELAY as well or in place of the transaction log backup.

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply