Are Heaps Fragmented by Default

  • Hello,

    i've done some research in BOL and other websites, but no where can I find an answer to my question. I have a table that i use to archive some data. Each time I archive the data I use a timestamp. So the archive table is insert only. When users need data, they most likely need to query all the data by a particular date or some other single field. So initially I left a primary key off the table and just put in specific non clustered indexes that would be queried in the where clause, like archive_date. My reasoning for not having the clustered index was to limit the import process.

    Another DBA requested I put the PK on the table, which I obliged, but I was

    1. wondering what do you call a table with no PK, but it has non clustered indexes, is it still a heap?

    2. the other DBA said that heap is by definition fragmented. I thought a table would only be fragmented if inserts/updates/deletes happened, not to insert only tables. Which is correct?

    Thanks

  • A heap is any table that does not have a clustered index. A table without a clustered index can become very fragmented very quickly even if it is just insert only.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • natarshia.peck (4/5/2011)


    Hello,

    i've done some research in BOL and other websites, but no where can I find an answer to my question. I have a table that i use to archive some data. Each time I archive the data I use a timestamp. So the archive table is insert only. When users need data, they most likely need to query all the data by a particular date or some other single field. So initially I left a primary key off the table and just put in specific non clustered indexes that would be queried in the where clause, like archive_date. My reasoning for not having the clustered index was to limit the import process.

    Another DBA requested I put the PK on the table, which I obliged, but I was

    1. wondering what do you call a table with no PK, but it has non clustered indexes, is it still a heap?

    2. the other DBA said that heap is by definition fragmented. I thought a table would only be fragmented if inserts/updates/deletes happened, not to insert only tables. Which is correct?

    Thanks

    Don't get confused between a primary key constraint and a clustered index - they're not the same thing.

    A table with no clustered index is indeed a heap, whether or not it has any non clustered indexes.

    If your table only has inserts, with no deletes or updates, then you are not likely to experience a lot of fragmentation. So no, a heap is not fragmented by default, but will become fragmented if it undergoes operations that cause page splits in the middle of a page or leave gaps at random or arbitrary places in the pages. If you have no clustered index then you can't rebuild it and therefore you can't defragment. That's the danger with not having a clustered index.

    John

  • How can a heap be fragmented? We ask sql server to NOT keep any particular order, so it does just that.

    You can't have no index and then complain that there's no order... it's like asking for ice lava, doesn't make much sens!

    The day you need an order to be kept, put the clustered index and be done with it (will require more index maint than a "normal" table because the inserts will be all over the place).

  • Maybe fragmentation isn't the right word - I don't know. What I'm referring to are the empty spaces left on the data pages after page splits occur. Over time, this means that more reads are necessary to return data for a query than would be the case if there were no gaps.

    John

  • Thanks for responding... So most of the time we say put a clustered index on a table for the sake of having one. So I used a 'natural key' to cluster the table. But sometimes I see people using an Identity Key (with no FK) just to put an clustered index on a table. What good that does me? If I have heap and then put a non clustered index on it, does it still require a table scan. Say my non clustered is a 'load_date', and I select were load_date = '01/01/11' . How does this affect my performance overall.

    Again. Initially I left it as a heap because the inserts were massive and then just put the non clustered for the queries.

  • natarshia.peck (4/5/2011)


    Thanks for responding... So most of the time we say put a clustered index on a table for the sake of having one. So I used a 'natural key' to cluster the table. But sometimes I see people using an Identity Key (with no FK) just to put an clustered index on a table. What good that does me? If I have heap and then put a non clustered index on it, does it still require a table scan. Say my non clustered is a 'load_date', and I select were load_date = '01/01/11' . How does this affect my performance overall.

    Again. Initially I left it as a heap because the inserts were massive and then just put the non clustered for the queries.

    Without a clustered index, that nonclustered index will do nothing for your table in preventing fragmentation. Your queries may produce table scans or index scans (even if you create a clustered index you may still see index scans) depending on how your queries are written. A clustered index can be defragged but a non-clustered index cannot be defragged without the presence of a clustered index.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (4/5/2011)


    a non-clustered index cannot be defragged without the presence of a clustered index.

    Jason, are you sure about this? My understanding is that you can rebuild the non clustered index, but that this will not have any effect on the actual rows of the table - in other words it won't get rid of the gaps caused by page splits where updates and deletes have occurred.

    natarshia.peck (4/5/2011)


    Thanks for responding... So most of the time we say put a clustered index on a table for the sake of having one. So I used a 'natural key' to cluster the table. But sometimes I see people using an Identity Key (with no FK) just to put an clustered index on a table. What good that does me? If I have heap and then put a non clustered index on it, does it still require a table scan. Say my non clustered is a 'load_date', and I select were load_date = '01/01/11' . How does this affect my performance overall.

    Again. Initially I left it as a heap because the inserts were massive and then just put the non clustered for the queries.

    It all depends on your circumstances. One reason you may wish to create an identity column to use as your clustering key is that your natural key is very wide. Since the clustering key is used in all non clustered indexes as a pointer to the row, the narrower your clustering key, the fewer pages your non clustered indexes will be and the better your performance will (or rather, may) be.

    John

  • Thanks John that makes sense.

    On the fragmentation issue - if i never ever do updates or deletes (this is an archive table and loaded incrementally daily), what type of fragementation am I looking to come across with the table?

  • As I said before, you shouldn't get too many problems if you only do inserts, although I can't guarantee that. One problem you will have is that your database will get bigger and bigger and sooner or later you may choose to delete some of the older data. As soon as you do that, fragmentation becomes a possibility.

    John

  • Depends on the clustered index of the base table. If the first column is identity of getdate() you'll get very little fragmentation.

    However if you're pk is anything else like ssn, item_id, <anything>_id. Then the insert would have to insert "everywhere" in the clustered index of the history table.

    You could choose to avoid table problem by adding getdate() as the leading column, but then searching for the most recent or all history for x item will product a table scan so you're still screwed.

    The 1 time I had to do this I used the base PK as leading column(s) and dateinsert as the final key. I had time to rebuild indexes daily so fragmentation wasn't an issue for me at then end of the day.

  • i've googled bol and it's not helpful. any books i could buy on this topic that you guys recommend?

    thanks

  • John Mitchell-245523 (4/5/2011)


    CirquedeSQLeil (4/5/2011)


    a non-clustered index cannot be defragged without the presence of a clustered index.

    Jason, are you sure about this? My understanding is that you can rebuild the non clustered index, but that this will not have any effect on the actual rows of the table - in other words it won't get rid of the gaps caused by page splits where updates and deletes have occurred.

    You can rebuild the NC but fragmentation will persist. A clustered index is required to defrag the NCs.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • natarshia.peck (4/5/2011)


    i've googled bol and it's not helpful. any books i could buy on this topic that you guys recommend?

    thanks

    You can try the Internals books. Also lookup Kimberly Tripp and Kalen Delaney and Gail Shaw. They are excellent resources.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • What books do yall recommend on this topic?

    Also, fragmentation aside. I have two tables that are the same in structure and data (imported exactly the same).

    One has a clustered index and the other has a non clustered index, both on the same field.

    If I query both using the where clause on the index field, is the clustered index gauranteed to be faster.

    I guess I could run some tests in my spare time or better yet the server's spare time (nil 🙂

    thanks

    --ps -- i meant on a field that is inserted sequentially, not id fields... i think that answer has been already been posted

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply