Very large table with a lot of deletions, yet little fragmentation...

  • I have a table of more than 1 billion rows. It does insertion and deletion (>=20,000 rows each time) many times a day.

    It has one NON-UNIQUE Clustered index, and that's it. I would think that with this much of data modification on this big table every day, its fragmentation would get pretty high, but its fragmentation is only 2%. I don't understand this. Do you have any idea or what could explain its fragmentation so low? The table has not been defrag since it was created. Any inputs would be greatly appreciated.

  • Without seeing things like what the indexes (clustered and nonclustered) are on, what ranges of data is inserted and deleted, it's near impossible to say.

    Do note that deletes do not cause fragmentation. Inserts can, depending on where the new rows are in the index, but deletes cannot.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail. The non-unique clustered is on the companyID(integer) key, and insertion and deletion is done at each of these companyid (at the where clause). I'm glad that it has very low fragmentation, but it just puzzles me.

  • Int Identity? Int something else? Always deleting the same number of records per companyID that you're inserting?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The int is the companyID (it's probably was an identity from the source we get the data from, but not when it is insert into our table), and the companyID would repeat for every record of thousands/hundred thousands for each insert and delete, this is why it we can't put a primary key (since PK is clustered and unique; so the companyid can't be repeated).

    the table is something like this (it has more columns than the example, but just to give you an idea, and the companyID

    was put in the middle (it's not the good design, but I don't think that should make a difference, do you?)).

    productId saleid url companyid inputdate

    1 1 http://www.somecompany.com 12345 2009-01-01

  • oh, and yes, it always delete the same number of records as the number were inserted. This is because once those records for that particular companyid have moved to the 2nd table, the process would delete those records from the first table. So, the number of inserts and delete for a particular company on the 1st table are always the same.

  • So the deletes make space and the inserts reuse that space. If the rows are all the same size, I'm not too surprised this causes minimal fragmentation.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks much. Your explanation makes sense. I feel better now :).

  • IIRC the way frag is presented/calculated on heaps is a bit different than for clustered tables. Bit early this morning so could be confuzled here. 🙂 You didn't specify what level of the index_physical_stats scan you were doing either, which can also cause reported differences.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (7/26/2010)


    IIRC the way frag is presented/calculated on heaps is a bit different than for clustered tables.

    Yup. The avg_fragmentation_in_percent in dm_db_index_physical_stats is extent fragmentation for a heap and logical fragmentation for a cluster/nonclustered index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This table has one non-unique clustered, so it couldn't be a heap (a heap is a table has no index, correct?).

    I'm using the avg_fragmentation_in_percent in dm_db_index_physical_stats. I also compared it to the old way

    to do it in sql 2000, DBCC Showcontig, and the % is the same as the logical fragmentation %.

  • sqlblue (7/26/2010)


    This table has one non-unique clustered, so it couldn't be a heap (a heap is a table has no index, correct?).

    A heap is a table without a clustered index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply