adding clustered index increased the number of rows in a large heap table

  • Dear all,

    I'm new with SQL Server and also the new in charge of the SQL Servers of my company. The problem

    is that for some years... about 6 or 7 years, there was no one taking care of the DB servers. Just developers putting everything that they wanted to put in the DB's and no one administering it.

    Currently we are still using the SQL Server 2000 and I have a big problem with a big table in a big database.

    The larger table has about 130Gb, with 90millions of rows and, due to business matters, they don't want clean the data inside it by now. So, as the database has more than 300Gb, all in one datafile, with all the tables more fragmented than my bank account:crying:, the backup/restore of the database turned a problem, taking more than 6 hours each one.

    I'm looking for solutions to solve this and I already tried the following in a test environment:

    dbcc indexdefrag / dbcc reindex

    dbcc shrink database / shrinkfile

    and all of the combination of the commands above, has took more than 450 hours running 24hrs without stop.I'm not crazy... I waited because they asked me to report every result of everything.

    The result was more than 100Gb released to the OS. Sounds good, but, impossible to do in the production environment.

    After that, I've tried to create a clustered index in the table described above, in a new datafile. The clustered index took 11hours to be created, but, the result of the size was better than reindex the table. I became a table with 50Gb of size versus 70Gb after the reindex. The weird thing is that after the creation of the clustered index, the number of rows increased about 5000 rows. Also, the space wasn't released to the OS, just stayed as "unused space". Is there anyway to release this without the shrink ? Since the test with shrink took more than 200 hours to group all the data and than, release the free space...

    Yes, I'm in trouble hehehehe. Any help will be welcome.

    Thank you very much.

    Shin-Iti.

  • The number of rows probably didn't increase... they're just being reported more accurately because the new clustered index takes over as the reported rowcount.

    I'd peel one potato at a time... I'd build a crawler to rebuild the indexes on the smaller tables one index at a time during off hours. I'd also figure a way to split the 90 million row table up into smaller tables, perhaps by year, and build new indexes on the smaller tables so they can support a partitioned view (you did say SQL Server 2000, right?).

    I'd also get something like DiskKeeper and start a disk defrag... if the DB is in that bad a shape, you just know that the underlying hard disks are going to be pretty well hammered, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I’d go with Jeff’s approach and go a bit farther. In a place that I’ve worked in the past we had a UserLog table that we had to keep its data for 7 years. I don’t remember the exact number of records at this table, but it was a huge table that gave us lots of problems. The table couldn’t be maintained at all. We couldn’t defrag it and couldn’t rebuild it. I don’t remember it for sure, but I think that we also couldn’t create new index on it (Again it was few years ago, so I don’t remember all the details). By the end we decided to use partitioned views and divide it by year’s quarter. We first created empty tables on a different database (but on the same server). Each table had a constraint and could only store data for one year quarter (of course each table was used for a different year. Then we started to fill the empty tables with data from the main table. When we finished filling the history tables, we created the table for the quarter that we were at (at the original database) and started filling this table with data up to a known point at time. During those steps the original table was up and there was no downtime at all. The next step involved a small downtime. We renamed the original table so no one will use it. Then we finished transferring the data from the original table into the new table. We created a partitioned view on all the tables and named it as the original table. At that point everything was up and running again. We kept the original table (with a different name) for about a month until we were absolutely sure that we won’t need it, and then we dropped it.

    This helped us in few ways. The history data didn’t have to be reorganized all the time because it was in separated tables then the present data. We could also run indexdefrag on the present data because the table was much smaller the original table. Another benefit was that we didn’t have to backup the history data each time that we backed up the database, because that part of the data was on its own database. The only thing that was added was a job that ran before a new quarter started and created a new table and added it to the partitioned view. A tiny price to pay for the benefits:-)

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hummm... you are right. I gonna do it.

    Anyway, I will not scape from the shrink to release the unused space after spread the rows in smaller tables. I was checking the current grow of the database and it's about 2Gb/month. So I think that I won't need all the released space after this operation. After the indexdefrag test, more than 120GB were released as "unused space". Almost half of the actual size of the database.

    I'm also thinking in put the new smaller tables in a separated datafile before shrinking the actual and big one. May be, with less data, the shrink will be faster, right? I don't have so much time to keep the database offline, less than one day. I will check how much time it takes in the test environment.

    Jeff and Adi, thank you very much for the attention!!

    Knowledge is power, but, know how to teach and share this knowledge is something so much powerful.

    Thank you.

    Shin-Iti.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply