Very large table with a lot of deletions, yet little fragmentation...

Question

Very large table with a lot of deletions, yet little fragmentation...

sqlblue

SSCrazy

Points: 2324
More actions
July 25, 2010 at 1:47 am

#235690

I have a table of more than 1 billion rows. It does insertion and deletion (>=20,000 rows each time) many times a day.
It has one NON-UNIQUE Clustered index, and that's it. I would think that with this much of data modification on this big table every day, its fragmentation would get pretty high, but its fragmentation is only 2%. I don't understand this. Do you have any idea or what could explain its fragmentation so low? The table has not been defrag since it was created. Any inputs would be greatly appreciated.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 1

Without seeing things like what the indexes (clustered and nonclustered) are on, what ranges of data is inserted and deleted, it's near impossible to say.

Do note that deletes do not cause fragmentation. Inserts can, depending on where the new rows are in the index, but deletes cannot.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sqlblue SSCrazy Points: 2324 More actions · Answer 2

Thanks Gail. The non-unique clustered is on the companyID(integer) key, and insertion and deletion is done at each of these companyid (at the where clause). I'm glad that it has very low fragmentation, but it just puzzles me.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 3

Int Identity? Int something else? Always deleting the same number of records per companyID that you're inserting?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sqlblue SSCrazy Points: 2324 More actions · Answer 4

The int is the companyID (it's probably was an identity from the source we get the data from, but not when it is insert into our table), and the companyID would repeat for every record of thousands/hundred thousands for each insert and delete, this is why it we can't put a primary key (since PK is clustered and unique; so the companyid can't be repeated).

the table is something like this (it has more columns than the example, but just to give you an idea, and the companyID

was put in the middle (it's not the good design, but I don't think that should make a difference, do you?)).

productId saleid url companyid inputdate

1 1 http://www.somecompany.com 12345 2009-01-01

sqlblue SSCrazy Points: 2324 More actions · Answer 5

oh, and yes, it always delete the same number of records as the number were inserted. This is because once those records for that particular companyid have moved to the 2nd table, the process would delete those records from the first table. So, the number of inserts and delete for a particular company on the 1st table are always the same.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 6

So the deletes make space and the inserts reuse that space. If the rows are all the same size, I'm not too surprised this causes minimal fragmentation.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sqlblue SSCrazy Points: 2324 More actions · Answer 7

Thanks much. Your explanation makes sense. I feel better now :).

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

IIRC the way frag is presented/calculated on heaps is a bit different than for clustered tables. Bit early this morning so could be confuzled here. 🙂 You didn't specify what level of the index_physical_stats scan you were doing either, which can also cause reported differences.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 9

TheSQLGuru (7/26/2010)
IIRC the way frag is presented/calculated on heaps is a bit different than for clustered tables.

Yup. The avg_fragmentation_in_percent in dm_db_index_physical_stats is extent fragmentation for a heap and logical fragmentation for a cluster/nonclustered index.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sqlblue SSCrazy Points: 2324 More actions · Answer 10

This table has one non-unique clustered, so it couldn't be a heap (a heap is a table has no index, correct?).

I'm using the avg_fragmentation_in_percent in dm_db_index_physical_stats. I also compared it to the old way

to do it in sql 2000, DBCC Showcontig, and the % is the same as the logical fragmentation %.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 11

sqlblue (7/26/2010)
This table has one non-unique clustered, so it couldn't be a heap (a heap is a table has no index, correct?).

A heap is a table without a clustered index.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass