Very large t-log with backups

  • I have a database that's 125GB, but the t-log is 190GB.  I do full backups each day and t-log backups every 15 minutes.  I've verified the log backups have been successful for the past week.  Generally when I see a large log like this it is because backups are failing.  The log isn't growing out of control, it's just....large.  I ran a shrink to get it down to 50GB about a month ago, but it climbed back to 180-190GB very quickly and then just stayed there.  Is this a problem or should I just be at peace with "this is just what the database needs"?  And if it is an issue, what types of things should I look at to find out what is causing this?

  • Your database is write heavy(update/insert/delete). It should be fine. Or, change t-log backup to every 5 minutes from 15 minutes.

  • Are you doing index maintenance on this database?  And, if you are, are you making the mistake of following the supposed "Best Practice" recommendation of "REORGANIZE between 10% (or 5%) and 30%" Fragmentation?  If so, there's a strong possibility that's your problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I also suspect index maintenance.  If so, associated LDF growth can be minimized by temporarily changing tlog backup frequency to something very frequent (e.g. 30sec) while index maintenance is running, then back to normal afterwards.

  • To clarify what I said about index maintenance... it's not so much the fact that index maintenance is being done, although doing index maintenance in the FULL Recovery Model is fully logged whether you do an index REBUILD or REORGANIZE.

    A big and rather unknown-to-most problem with REORGANIZE is that it's not the tame little kitty that the world has made it out to be.  A lot of people that have AG, Mirroring, Log Shipping, or whatever use REORGANIZE because it's supposedly easier on the log file.  In fact, REORGANIZE can be MUCH worse than REBUILD when it comes to writing to the log file.

    My personal example is that I have a 146GB table (Clustered Index) that had only 12% fragmentation.  To me sure, most of the fragmentation was near the end of the index because the Clustered Index is based on an ever-increasing IDENTITY column, the table suffer NO updates, only suffers DELETEs, and only suffers those DELETEs on data that is less than a day old.  That means that there's plenty of room for REORGANIZE to "compress the data" back up to the fill factor (which is appropriately 0 (100), in this case).

    To test what would actually happen in Prod, I did a restore of the database (that contains that table) onto a test system and did the REORGANIZE... The Log file grew from 20GB to 227GB!!!  Remember... most of this table was already defragmented and there was only 12% logical fragmentation!!!

    Resetting the test by doing another restore from the same backup, a REBUILD only caused the log file to grow to 146GB, which was totally expected in the FULL recovery model.

    Both the REORGANIZE and the REBUILD took about and hour and 21 minutes to complete.

    Now... we use "SAN Replication" rather than relying on SQL Server and it's log file.  That means that I don't have to remain in the FULL Recovery Model and I repeated the same experiments in the BULK LOGGED Recovery Model.  As expected, REORGANIZE produced the same miserable performance and huge log file (and T-LOG backups were every 15 minutes in all cases, BTW).  The REBUILD (completed in only a little over 12 minutes (not a misprint) and the log file didn't grow at all (in production, it grew to 37GB because there's a lot of other stuff going on, as well, but that's a whole lot less than 227GB!!!).

    To be sure, the REBUILD was OFFLINE.  Even in the BULK LOGGED Recovery Model, and ONLINE REBUILD sucked as bad (actually, it was worse for time even though on an otherwise inactive system) than doing the REBUILD in the FULL Recovery Model.

    Worse yet, REORGANIZE didn't actually defrag all of the pages.  About 20,000 pages were left with page densities of only 10 to 80% full.  The REBUILD didn't leave such things.

    REORGANIZE sucks... even on small stuff (death by ten thousand small cuts).  My recommendation is to not use it unless you need to compact some LOBs (and there' are ways to avoid even that problem because you may have to REORGANIZE 10 or 12 times to actually have it work fully), stop using it.  It's actually better in most cases to not defrag most indexes at all rather than use REORGANIZE (and for a lot more reasons than I've stated here) and use REBUILD when you have a little time even if you have replication or other things that rely on the log file running.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • What I've read here and what I'm seeing in my system, it's definitely the weekly index defrag that is the culprit.  I have been using the Hallengren reindex script and in general it works great for most of my databases.  But I'm going to be doing some experimenting with either not running a scheduled index defrag on some of my larger tables (they're very large and very sequential)...or tightenting up the backup window during the defrag.  If skipping the regular index maintenance doesn't cause performance issues, I might just leave it at that or do an offline rebuild as Jeff suggested if that works well in my test environment.

    Extra thank to Jeff for the detailed examples.  It was very helpful to me!

  • Greg Goss wrote:

    What I've read here and what I'm seeing in my system, it's definitely the weekly index defrag that is the culprit.  I have been using the Hallengren reindex script and in general it works great for most of my databases.  But I'm going to be doing some experimenting with either not running a scheduled index defrag on some of my larger tables (they're very large and very sequential)...or tightenting up the backup window during the defrag.  If skipping the regular index maintenance doesn't cause performance issues, I might just leave it at that or do an offline rebuild as Jeff suggested if that works well in my test environment.

    Extra thank to Jeff for the detailed examples.  It was very helpful to me!

    "Very Sequential" indexes should not become fragmented to even a piddly 2% for years.  If they are becoming fragmented more rapidly, you're probably dealing with "ExpAnsive Updates" and lowering the FILL FACTOR on such indexes is just going to waste a shedload of both disk and memory space.  So is REBUILDing the indexes except to recover some of the space wasted by the bad page splits that are occurring (which also crush the log file with a shedload of totally unnecessary log file entries and can cause some pretty serious blocking).

    There are ways to fix them (important, especially on "very large and very sequential indexes").  It'll take a bit of work to fix them so they never fragment but, IMHO, it would be worth it.  You game?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff, I'm always game to learn how to better finesse my databases into submission 🙂  If I could get this under control it would help my reporting and analysis processes also.

    Here's the scenario.  There are two tables in this database that I'm considering.  The database I'm referring to feeds our ERP system (which I don't have unrestricted access to change because it's a purchased system).  The first table is a log of all inventory transactions (production and usage) from the "beginning of time" such that adding all of them up for a given item would yield what is currently "on hand".  Think of it like a accounting General Ledger, but for inventory.  It's 18.5M rows, 13GB data, 16GB indexes (a PK and maybe 13 non-clustered).

    I have a process that runs every 5 minutes which feeds that data (whatever has changed since the last time the process ran) into a reporting table that has all of the same rows, but less indexes (6GB worth of a pk and 6 non-clustered).  Based on what I've learned here, and through the little research I've done, I would guess these tables are the culprits of my inflating log file.

    The rows that get put in this table come in relatively rapidly and are logged in order.  A couple times a week accounting might go in and edit some of the transactions, but most rows remain untouched once they're written (from what I've been told about how the ERP system was designed...however I'm willing to be skeptical about that)

    When I did an offline rebuild in DEV of the database in question (after changing DEV to full recovery), it took 1.5 hours (about the same as it does in production) and caused the log file to grow from 600MB to 20GB.  So I'm seeing the same behavior in DEV for the most part.   If I did an online or a REORGANIZE I'd expect that to be larger growth based on your advice earlier.

    I'm willing to put in the effort, I just need a some guidance on where to start and how to test effectively to get to the root cause.

    -G

  • @greg...

    You say it took 1.5 hours to rebuild the DATABASE. How many GB does the database contain and was that to REBUILD all of the indexes on all of the tables or just the ones that needed it.?

    Also, is it absolutely necessary for you to be in the FULL RECOVERY model in prod at all times or can we slip into the BULK LOGGED mode during the index rebuilds?

    Shifting gears back to the two tables you're talking about...

    I would first kick accounting in the ass.  It sounds like this is supposed to be a transactional history table and, if it is, they should not be modifying rows to "make corrections".  They should be adding rows to make corrections or the history of their "tinkering" is lost.  If it's like I've described, it would fail an SEC or similar audit in the first 15 seconds of the audit.

    Also, I've never heard of such a thing as a reporting table having FEWER indexes than the original table.  Why does the original table have 13 indexes and the reporting table only has 6 and are they all actually being used?  Does this reporting table live in the same database as the original table, as well?

    And, is this the largest table in the database?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • The TEST database is about 130GB (roughly half data, half index).  I'll need to restore PROD to TEST again to test more reindexing scenarios since TEST isn't used much and won't get fragmented enough for me to play with.  But to answer your question, the REBUILD I did was the entire database and the time included other databases.  Based on my estimate, the actual time of rebuild for just that database was about around 20-30 minutes.

    It is only necessary for the Production DB to be in FULL RECOVERY if something goes wrong while it's not in FULL RECOVERY 🙂  But I don't have any issues playing around with the recovery mode in prod for the time it would take me to run some tests.

    As far as the transactional nature of the table, I would agree that once a row is written, it ideally shouldn't be touched.  But because of how things run operationally, that's not always the most accurate way for accounting to correct inventory cost.  I've brought this up to our cost accountant before, and they were able to provide justification enough for me to let it go.  However, I would be willing to bring it up again if the view was worth the climb.

    I personally created the REPORT table and only placed the indexes on it that I needed for my purposes.  I didn't design or have any input on the ORIGINAL table nor the indexes on it.  This ERP system is not one of the standard big name ones and has been highly customized over the years.  I'm guessing the excessive indexes on the table are a result of things that have been added/removed and may not even be needed.

    And yes, both of these tables are in the same database.  I did that because there is a need to join the transaction table to items, customers, and other tables.  At the time, the reporting system hit the transaction table directly which was causing issues.  So I created the REPORT table along side the ORIGINAL and simply modified the reporting queries by changing the name ORIGINAL to REPORT.  This table could be moved somewhere else if there was sufficient need, but I would need to consult with the ERP vendor to significantly change the ORIGINAL table.

    And yes, other than the General Ledger table (37M rows), these tables (18M each) are the largest in the database.  The next largest table is 3M rows.  I would say roughly 15 tables out of 400 are over 1M rows.

  • Thanks for the detailed information, Greg.

    First, if the log file is getting too big for you on the database in question and it's only taking 20-30 minutes in the FULL Recovery Model using REBUILDs, you can get that all way down on the large tables by temporarily shifting your database to the BULK LOGGED Recovery Model and using WITH (ONLINE=OFF).  You can protect the database a bit more than you thing by doing a logfile backup just before shifting to BULK LOGGED and then again right after you shift back to the FULL Recovery Model.  With all that, the REBUILDs of the indexes will be minimally logged and you might be able to get your REBUILD times down to just a handful of minutes, especially if you restrict your REBUILDs only to the indexes that actually need it.

    The key on these larger indexes with "ever increasing sequential keys" is to make it so they don't suffer "ExpAnsive Updates".  If you can do that, it'll take years for the fragmentation to get to just 2-4%.

    That being said, can you post the CREATE TABLE statement with all of the indexes and constraints for the tables in question and also identify which variable width columns your accountants are updating with wider data so we can take a stab at fixing them so they never fragment?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Before and after backup log, can you run this command and post the result here?

    dbcc sqlperf(logspace)

    this command could return the physical and logical size of actual log. Sometime, the physical size log file (.ldf file) is huge but it only contains little log.

    if DML commands are happening heavily on your DB, i suggest not to shrinking it, since it will expand again soon. Since you are backing up log every 15 minutes, the only option for you is to increase disk space to hold the log.

    (i dont recommend to change it to be simple recovery mode since you could not recover DB by log backup)

  • ufrufr wrote:

    Before and after backup log, can you run this command and post the result here?

    dbcc sqlperf(logspace)

    this command could return the physical and logical size of actual log. Sometime, the physical size log file (.ldf file) is huge but it only contains little log.

    if DML commands are happening heavily on your DB, i suggest not to shrinking it, since it will expand again soon. Since you are backing up log every 15 minutes, the only option for you is to increase disk space to hold the log.

    (i dont recommend to change it to be simple recovery mode since you could not recover DB by log backup)

    I think that you advise is pretty standard and that's good but... a lot of people miss "the rest of the story"...

    If you have a large T-Log file and it's usually almost empty, then you have a serious problem with some code somewhere that needs to be found and fixed and not just because it makes for an unnecessarily large log file that would slow down DR restore efforts.  Sometimes it's bad code, sometimes it's index maintenance, and sometimes the bad code is actually the index maintenance (especially if folks are using REORGANIZE).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • @JeffModen,

    Okay, I know it took longer than expected, but I finally got my hands on the code that updates that table.  Here's the process.  Once a record has been marked as interfaced (B2GRT = 'Y'), it can't be updated.  Up until that point, a cost reset can be run that updates the following fields on that table: B2WRT, B2NUM1, B2NUM5, B2NUM15, B2NUM16, B2NUM17, B2NUM18, B2NUM19, B2NUM20....all of which are decimal, no varchar.  I've attached the CREATE statement for the table to this post

    • This reply was modified 4 years, 10 months ago by  Greg Goss.
    Attachments:
    You must be logged in to view attached files.
  • Greg Goss wrote:

    @JeffModen,

    Okay, I know it took longer than expected, but I finally got my hands on the code that updates that table.  Here's the process.  Once a record has been marked as interfaced (B2GRT = 'Y'), it can't be updated.  Up until that point, a cost reset can be run that updates the following fields on that table: B2WRT, B2NUM1, B2NUM5, B2NUM15, B2NUM16, B2NUM17, B2NUM18, B2NUM19, B2NUM20....all of which are decimal, no varchar.  I've attached the CREATE statement for the table to this post

    Interesting that out of all of those columns above, the single character B2GRT might be a serious problem.  It's a VARCHAR(1).  If it starts off NULL (didn't see a Default to make me think otherwise), then it's ExpAnsive when they update it to a "Y".  VARCHAR(1) is also a huge waste because it actually takes 3 bytes whereas a CHAR(1) would always take just 1.  Since it is ExpAnsive, the Clustered Index and 3 Non-Clustered Indexes will see page splits because of updates to that column.

    It's also interesting that the Clustered Index is 4 columns wide and the 4th column is actually an IDENTITY column.  This means the Clustered Index can be classified as a "Sequential Silo" index.  While these types of indexes fragment to 99% very quickly in the face of inserts, their Page Density will stay at nearly 100% provided there is no other causes of fragmentation.  Unfortunately, the B2GRT column might just be such a source of fragmentation as I pointed out above.  If it were always filled during inserts and updates, that source of fragmentation would away.

    Unless the 4 columns in the Clustered Index actually do make for an truly "ever increasing key", the Clustered Index will fragment very quickly as a "Sequential Silo" index.  Reducing the Fill Factor on such a thing would be a complete waste because of the siloing that occurs.  The current Fill Factor is set to 90.  It should be changed to 98 if it is sequentially siloed and 99 if the keys are truly ever increasing.

    Someone has also created an index on the B2GRT column all by itself.  That could be (and usually is) a pretty useless thing to do because of the uber low cardinality of that column.

    You also have a fair number of indexes on this table that begin with the [B2BNR] and [B2PLT]  columns.  Unless there is something holy about the column order of the Clustered Index, changing order to start with those 2 columns (already in the Clustered Index) might allow you to delete a wad of NonClustered indexes (unless the current order was used to eliminate other indexes).

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply