Heap tables with lots of unused space

  • We are running into some situations with heap tables in our environment.  We have uncovered five different heap tables with anywhere form 35 MB to 100 MB of data in them, but the reserved space ranges from 1.8 GB to 5 GB.  Most of the space is unused or unallocated.

    We understand the basic reason for the problems, which is the heap table does not have a clustered index and thus there is no defragmentation.

    We also believe we understand that a heap table will always perform inserts and updates at the end of the table instead of in whitespace at the beginning of the table.

    We can add a clustered index to any of the five tables and reduce them to near the 35 - 100 MB size obviously depending  on the fill factor for the CI.

    The questions that we do not understand.

    1)  Some of the heap tables only do inserts.  No deletes or updates.  Why would these be filling up with whitespace?

    2)  On the tables that have a lot of whitespace, we tested one of our nightly jobs against it.  When it was 5 GB in size with 86 MB of data, it would take the job 35 minutes to complete.  After adding a CI with a 100% fill factor, the job took 15 minutes to complete.  For testing purposes, we then removed the CI again so it was a heap of 86 MB in size.  When we re-run the same job it now took 15 minutes to complete as well.  The CI did not help performance because this job runs a function that updates each record in the table so it does a full table scan anyway with or without the CI.  Our theory is that when the table is 5 GB in size, the SQL server still has to read the page information for 4.9GB of all the blank pages of data before it gets to the 86 MB of pages that it needs to read the actual data.  Does anyone have a better explanation for this?

    3)  We have been trying to read about ways to use a heap table effectively.  I guess we really don't need to use a heap table but in some of our tables, we don't need an index.  We are only using a CI to organize the data on disk.

    Thank You,

    Kevin

  • I'm no expert on how the server organize the data.  However I'd expect the server to be bright enough to reuse the available space (please correct me if I'm wrong on this one).

     

    I've heard of similar cases where a column was changed from blob (text, imagine, ntext...) to varchar, or dropped altogether.  In that case the server does NOT reuse the space because the server would have to rebuild the whole table to do so. 

    Is this something that could have started this whole thing in your case?

    What happens if you create and leave the CI on for a few days (weeks)?  Does the table keep growing without any apparent reason?

     

    Are you able to add the CI, then rerun the job like 10-20 times to see if it still grows each run without reclaiming the space (I normally wouldn't ask of this of production server but I'm afraid it may be hard to reproduce on a test server but I'd still propose to try this on QA box first to see if the bug is reproducable)?

  • It seems that with a heap table, SQL is not bright enough to reuse availble space.  We have read a few articles that claim that with a heap, all data is added to the end of the table and not re-used, but there is not a lot of information about heap tables, so it is hard to confirm that.

    We have not restructured the schema at all so that should not be a cause for our situation.

    We have test and dev servers availble so we have been doing various tests.  The first interesting test is that as we applied CI's with various fill factors and re-ran the job, we always got an expected size database based on the fill factor and page splits caused by the row updates being performed.  Also the speed seemed very consistent from one run to another run with a CI in place.  The disk space did not grow outside of what we would expect from page splits with 100% fill factor.

    We did then remove the CI and ran some repeated tests against the table as a heap.  With repetitive runs it generally got slower and slower.  8 minutes the first run, 9 minutes the second run, 10 minutes the 3rd run, but then the 4th run was also 10 minutes.  Also the space used and space reserved did not change at all after running these jobs.  We were expecting to see the size of the table increase with each run of the job.

    Unfortunately we have not been monitoring the table space over time so we don't know if this has been a slow growth or if this has been a one time growth and the space never got reclaimed.

  • I can't wait to see what others have to see about this. Right now I'm drawing blanks on this one.

     

    At least you got things straightened out to a point where you don't have any problems.  Let's see what the rest of this community has to offer as for jewels of wisdom.

  • Well, I totally agree with you assessment from your #2 in that since there is no CI and therefore no direct link to the data page, SQL Server scans through the IAM pages to determine which data extents contain pages for your heap.  This is causing SQL Server to have to scan over the entire range of allocated extents to perform the read which is a table scan to begin with.  For this reason alone, I would recommend creating a CI on all of your tables.  Even if you are always working with the table as a whole and thereby using table scans for everything, a clustered index can be helpful in allowing you to manage fragmentation levels. 

     

    According to BOL, “SQL Server allocates a new extent to an object only when it cannot quickly find a page in an existing extent with enough space to hold the row being inserted.”  This tells me that the possibility exists for SQL Server to not use the available blank space in extents and instead create a new extent.  This could explain why you have so much free space in your production DB when your test DB test showed the extents getting re-used.  Is your production system fairly busy?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Our server is pretty active, but these jobs run during our quiet times.

    Thank you for the reply regarding our 2nd question.  That does shine some more light on our issue and we are starting to understand the IAM and a heap vs a clustered index better.

    Thank You,

    Kevin

  • If this helps anyone we found one of our problems related to our heap issues:

    If you just ran DELETE statements

    without specifying "WITH (TABLOCK)", the empty pages could not be re-used

    since the pages are not deallocated.

    You may also refer to this KB article:

    Space that a table uses is not completely released after you use a DELETE

    statement to delete data from the table in SQL Server

    http://support.microsoft.com/kb/913399/en-us

    I hope this helps anyone else that might run into this.

    We are still researching our other heap tables that have inserts only that also have high levels of unused disk space.

    Kevin

  • Space is not re-used in the Heap without rebuilding the table (by selecting into another table, adding a clustered index temporarily -- like what you did -- or in 2008, using the ALTER TABLE command with REBUILD option )

    ALTER TABLE <tablename>

    REBUILD

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply