I need to delete 20 million rows from a table...

  • Jeff Moden (3/12/2014)


    Siberian Khatru (3/12/2014)


    Shrinking my files down to a manageable level which should solve my disk space crunch a bit.

    Shrinking your files will also make a narly trainwreck out of your indexes. Shrinking cause nearly 100% fragmentation. If you do a shrink to recover disk space, you should also rebuild all of your indexes which, by the way, will cause your file to grow by slightly larger than your largest index which is usually the largest clustered index of your largest table.

    If you don't need the space for anything else, just rebuild your indexes and call it a day.

    Understood. I did shrink the files, but then immediately integrity checked and rebuilt the indexes. Now understand, I don't regularly shrink files or databases. But when a database (in this case 3 near identical databases) is radically over sized with 70% free space in it -- I think it's time to do the shrink routine. Long ago, when I first started doing this, I had an actual step in my maintenance plans to shrink the database daily. Dumb, dumb, dumb...but as I said -- I am an accidental DBA. I now try to estimate my sizes accurately, and plan as best I can. The devs threw me a curve on this with a crap load of useless records to delete, so I decided to delete (or in this case, truncate) them and then resize the database with 20% space available still for any "real" growth. I was wondering why it was growing so much more steadily than I anticipated lol. In any event, this is a relatively slow time and nobody even noticed....

  • Very cool. Thanks for the feedback. Just wanted to make sure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (3/12/2014)


    GilaMonster (3/12/2014)


    MyDoggieJessie (3/12/2014)


    TRUNCATE will be faster and be minimally logged.

    Fully logged. It's efficiently logged, but it is fully logged, not minimally logged. Minimally logged has a specific definition.

    I do wish MS would realize there's 3 levels of logging instead of just 2. It would solve the problem of having to explain this all the time.

    Fully Logged

    Paritially Logged (Page deallocation only), can be rolled back)

    Minimially Logged

    Fully logged and your 'partially logged' are the same thing.

    There are two logging levels, fully logged and minimally logged. The distinction is how they behave in bulk-logged/simple recovery, nothing to do with the amount of log records created. Some operations log per row, some log per page, some do either depending on circumstance.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/12/2014)


    Jeff Moden (3/12/2014)


    GilaMonster (3/12/2014)


    MyDoggieJessie (3/12/2014)


    TRUNCATE will be faster and be minimally logged.

    Fully logged. It's efficiently logged, but it is fully logged, not minimally logged. Minimally logged has a specific definition.

    I do wish MS would realize there's 3 levels of logging instead of just 2. It would solve the problem of having to explain this all the time.

    Fully Logged

    Paritially Logged (Page deallocation only), can be rolled back)

    Minimially Logged

    Fully logged and your 'partially logged' are the same thing.

    There are two logging levels, fully logged and minimally logged. The distinction is how they behave in bulk-logged/simple recovery, nothing to do with the amount of log records created. Some operations log per row, some log per page, some do either depending on circumstance.

    Yes... according to MS definitions... which I happen to disagree with and only because they cause so much confusion. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (3/12/2014)


    Yes... according to MS definitions... which I happen to disagree with and only because they cause so much confusion. 😉

    According to how things behave. Adding a third category would, I suspect, just add even more confusion.

    Where would you categorise operations which can full page images and can be minimally logged in bulk-logged and simple recovery models?

    Minimally logged and fully logged are in different categories because their different logging behavior has an effect on recoverability. In bulk-logged recovery, a minimally logged operation means that a log backup interval containing it must be restored entirely or not at all.

    Whether an operation logs at the page level or row by row however has no such affect on behaviour. The only way one could tell whether an operation has logged its work at the page level or the row level is by reading the transaction log directly.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/13/2014)


    Jeff Moden (3/12/2014)


    Yes... according to MS definitions... which I happen to disagree with and only because they cause so much confusion. 😉

    According to how things behave. Adding a third category would, I suspect, just add even more confusion.

    Where would you categorise operations which can full page images and can be minimally logged in bulk-logged and simple recovery models?

    Minimally logged and fully logged are in different categories because their different logging behavior has an effect on recoverability. In bulk-logged recovery, a minimally logged operation means that a log backup interval containing it must be restored entirely or not at all.

    Whether an operation logs at the page level or row by row however has no such affect on behaviour. The only way one could tell whether an operation has logged its work at the page level or the row level is by reading the transaction log directly.

    Well since I'm following along from home, and I want to learn whenever I can - what exactly is the difference between the two types of logging in SQL Server? Is it simply row by row logging versus page level logging, or is it more than that?

    If this is well explained in BOL, I can of course find it myself. You guys have already been very helpful!

  • Siberian Khatru (3/13/2014)


    what exactly is the difference between the two types of logging in SQL Server? Is it simply row by row logging versus page level logging, or is it more than that?

    No. Nothing to do with page vs row.

    In bulk-logged and simple recovery models, some operations can log only enough information for an undo operation to occur, not enough for a redo. This is called 'minimal logging'. Every operation always (no exceptions) logs undo information for transaction rollbacks

    Have a read through this, if you have any questions after please ask. http://qa.sqlservercentral.com/articles/Recovery+Model/89664/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/13/2014)


    Siberian Khatru (3/13/2014)


    what exactly is the difference between the two types of logging in SQL Server? Is it simply row by row logging versus page level logging, or is it more than that?

    No. Nothing to do with page vs row.

    In bulk-logged and simple recovery models, some operations can log only enough information for an undo operation to occur, not enough for a redo. This is called 'minimal logging'. Every operation always (no exceptions) logs undo information for transaction rollbacks

    Have a read through this, if you have any questions after please ask. http://qa.sqlservercentral.com/articles/Recovery+Model/89664/

    Excellent read and very informative. I always wondered about Bulk Logged Recovery, and even now I have more questions about Recovery Models in general relevant to disaster recovery scenarios. I am often placed in situations where my comfort level is less than good because of things I cannot control. Having seen enough of that, I am determined to control those things I can because I know nobody else here will bother to try and understand it anyway lol. Thanks for that Gail.

    Steve

  • GilaMonster (3/13/2014)


    Jeff Moden (3/12/2014)


    Yes... according to MS definitions... which I happen to disagree with and only because they cause so much confusion. 😉

    According to how things behave. Adding a third category would, I suspect, just add even more confusion.

    You're right of course. The problem is that a lot of folks in the field think that the term "fully logged" for TRUNCATE means the same as it does for a delete in that the actual rows of data would be logged in the log file. I wish there were an easier way to describe it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I have same situation as the OP, we have a 'scratch' database that the devs write all kinds of stuff too, only to delete it or send its data on to another database. So due to the occasional elephant in the pipe, the database is usually about 80% empty.

    Still not sure the best way to manage this thing except to keep enough free space for 'normal' processing without the thing eating its entire storage allocation for the occasional elephant in the pipe.

  • On shrinking databases, sometimes you see the advice "Never do it." My feeling is that there's virtually no feature in SQL that should NEVER be used, but there are features that are a good idea only in limited circumstances.

    From time to time in my environment I need to take a backup of a 1TB production database, remove nearly all the user data, then restore it to a DEV server that has multiple versions of the database. After deleting the data (using a vendor supplied procedure that probably uses TRUNCATE because it is relatively fast) I can shrink the DB to less than 10GB. Since deleting or truncating data does not reduce the file size and the DEV server does not have multiple TBs of storage, shrinking the database is a no-brainer and worth the extra step of rebuilding the indexes of the tables that still contain data.

    So there are no "never use" features. It's just a question of understanding the implications and using them when they make sense.

  • Siberian Khatru (3/12/2014)


    I have a table with @ 20 Million rows of data I want to truncate or delete. I don't need any of the data in it anymore at all. I also need to do this on disk space challenged servers. Each row is comprised of 71 characters divided as 35, 35 and 1 in the included columns. Would it be best to:

    - DELETE in batches (don't think so myself)

    - TRUNCATE the table (concerned how many pages get logged in this scenario)

    - Possibly? SCRIPT, DROP and RECREATE the offending table?

    The table itself is not foreign keyed anywhere, nor does it participate in an indexed view. It is not replicated anywhere, although it might be at some point. And yes, I know I need to get after the devs to create a process that limits growth here going forward -- that is not in my province to dictate however. I can merely "suggest" and hope which sucks, but I digress.

    I am an accidental, but long time and fairly capable DBA here where I work. In other words, I ask other experts before I leap on faith alone lol...

    What say you experts out there?

    Yes, attempting to approach this the wrong way will potentially hose your server for days, and rebooting a giant transaction half-way will only make things worse.

    I am confronted with the exact same scenario on occasion, backing out 10s or 100s of millions of rows from large tables in a data warehouse. You said that you're disk space challenged, so I'll tell you how to delete this 20 million row table while minimizing transaction log growth. It works by deleting the rows in batches. It also completes a hell of a lot faster when batching updates and deletes. In my case it's like 2 hours to complete compared to 2 days. It uses raiserror to print status as informational messages between each batch.

    This technique will yield better performance regardless of recovery model; simple, full, or bulk, but simple recovery will minimize log file growth. You can cancel and re-start at any time, and any rows deleted up to that point are committed, so long as you don't wrap this block in a transaction.

    Also, if possible, I'd suggest placing database in SINGLE or RESTRICTED USER mode while this is running to prevent anyone else from taking out locks on the table or opening their own transactions.

    set nocount on;

    declare @batch_rows int = 0, @total_rows int = 0;

    while 1 = 1

    begin

    -- to minimize transaction log growth, pausing and

    -- checkpointing after each batch will help re-use space:

    waitfor delay '00:00:05';

    checkpoint;

    -- delete batch of rows:

    delete top (1000000) from SomeBigTable;

    select @batch_rows = @@rowcount;

    select @total_rows = @total_rows + @batch_rows;

    -- print status message:

    raiserror('Rows affected: %d', 0, 1, @total_rows) with nowait;

    -- if no rows were deleted, then break from loop:

    if @batch_rows = 0 break;

    end;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Jeff Moden (3/13/2014)


    You're right of course. The problem is that a lot of folks in the field think that the term "fully logged" for TRUNCATE means the same as it does for a delete in that the actual rows of data would be logged in the log file. I wish there were an easier way to describe it.

    And a bunch of folks don't care and a small group have no idea what they mean at all but use the terms because they sound smart. 🙁

    Yes, I'm cynical, it's Friday.

    Education, as with anything. Those who care will learn, those who don't, well...

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Eric M Russell (3/13/2014)


    Siberian Khatru (3/12/2014)


    I have a table with @ 20 Million rows of data I want to truncate or delete. I don't need any of the data in it anymore at all. I also need to do this on disk space challenged servers. Each row is comprised of 71 characters divided as 35, 35 and 1 in the included columns. Would it be best to:

    - DELETE in batches (don't think so myself)

    - TRUNCATE the table (concerned how many pages get logged in this scenario)

    - Possibly? SCRIPT, DROP and RECREATE the offending table?

    The table itself is not foreign keyed anywhere, nor does it participate in an indexed view. It is not replicated anywhere, although it might be at some point. And yes, I know I need to get after the devs to create a process that limits growth here going forward -- that is not in my province to dictate however. I can merely "suggest" and hope which sucks, but I digress.

    I am an accidental, but long time and fairly capable DBA here where I work. In other words, I ask other experts before I leap on faith alone lol...

    What say you experts out there?

    Yes, attempting to approach this the wrong way will potentially hose your server for days, and rebooting a giant transaction half-way will only make things worse.

    I am confronted with the exact same scenario on occasion, backing out 10s or 100s of millions of rows from large tables in a data warehouse. You said that you're disk space challenged, so I'll tell you how to delete this 20 million row table while minimizing transaction log growth. It works by deleting the rows in batches. It also completes a hell of a lot faster when batching updates and deletes. In my case it's like 2 hours to complete compared to 2 days. It uses raiserror to print status as informational messages between each batch.

    This technique will yield better performance regardless of recovery model; simple, full, or bulk, but simple recovery will minimize log file growth. You can cancel and re-start at any time, and any rows deleted up to that point are committed, so long as you don't wrap this block in a transaction.

    Also, if possible, I'd suggest placing database in SINGLE or RESTRICTED USER mode while this is running to prevent anyone else from taking out locks on the table or opening their own transactions.

    set nocount on;

    declare @batch_rows int = 0, @total_rows int = 0;

    while 1 = 1

    begin

    -- to minimize transaction log growth, pausing and

    -- checkpointing after each batch will help re-use space:

    waitfor delay '00:00:05';

    checkpoint;

    -- delete batch of rows:

    delete top (1000000) from SomeBigTable;

    select @batch_rows = @@rowcount;

    select @total_rows = @total_rows + @batch_rows;

    -- print status message:

    raiserror('Rows affected: %d', 0, 1, @total_rows) with nowait;

    -- if no rows were deleted, then break from loop:

    if @batch_rows = 0 break;

    end;

    LOL...guess you haven't read the whole of the thread? Truncate did it in 1/2 a second :w00t:

    Sorry, couldn't resist :hehe:

    +100 to Gail, though. Great work and help. 🙂

  • For me, the keys were not needing to worry about getting all of the rows back, plus the minimal logging aspect. Both Gail and Jeff were helpful beyond that though as I have a better understanding of how SQL logs things in general now. I just downloaded a free Pdf file written in pay by Gail from the Stairway series on Transaction Log Management that is an excellent read. Many thanks to all, and I plan to be a far more active member of this community as my duties as DBA have become more challenging in the last couple of years - far beyond "part time". In other words, time to make myself an expert.

Viewing 15 posts - 16 through 30 (of 47 total)

You must be logged in to reply to this topic. Login to reply