Database corruption / steps to fix issues

  • Hey there everyone,

    After last nights maintance plans, I got the following error messages appearing on one of the DBS - and it's in a state where it can't be accessed (see below error message).

    Can anyone provide me with the trouble shooting steps to trouble shoot what is causing the error, and how to fix it.. This is the first time I have come across database corruption.

    ------------------

    Msg 824, Sev 24, State 2, Line 102 : SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 4:1878146; actual 24415:1599996549). It occurred during a read of page (4:1878146) in database ID 18 at offset 0x00000395104000 in file 'd:\sqldata\mydatabase.ndf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consisMsg 15653, Sev 24, State 1, Line 153 : , update is not necessary... [SQLSTATE 01000]

    (4:1878146) in database ID 18 at offset 0x00000395104000

    ------------------

    I have enabled:

    1. Enabled below trace to output more detail

    DBCC TRACEON (3604)

    dbcc page ('mydatabase',4,1878146,1)

    I get the following error message:

    Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

    Thought it was a corrupt page, so tried restoring the above page ; but got the above error message.

    2. I did run the DBCC checkdb('db',repair_rebuild, repair_allow_data_loss).

    Cheers

  • Running Repair without knowing what the problem was was probably a mistake. See Paul Randal's blog post here.

    Normally with corruptions you will want to restore from good backup. I'd restore in a test/dev environment, run CHECKDB, and then if that is okay restore in production.

    You definitely need to check your event logs for mentions about disk errors and check your disk subsystem as that is usually what causes corruption and if it happens once it will probably happen again.

  • If this is production, call MS. Don't mess around with this.

  • Steve Jones - Editor (10/12/2009)


    If this is production, call MS. Don't mess around with this.

    Definitionally agree with that, running the allow_data_loss would not have helped. I would get expert advice now if it is a production box.

    --------------------------------------------------------------------------------------
    [highlight]Recommended Articles on How to help us help you and[/highlight]
    [highlight]solve commonly asked questions[/highlight]

    Forum Etiquette: How to post data/code on a forum to get the best help by Jeff Moden[/url]
    Managing Transaction Logs by Gail Shaw[/url]
    How to post Performance problems by Gail Shaw[/url]
    Help, my database is corrupt. Now what? by Gail Shaw[/url]

  • First read this http://qa.sqlservercentral.com/articles/65804/

    Once you have read that...

    Run the following and post the full output.

    DBCC CHECKDB (<Database Name>) WITH NO_INFOMSGS, ALL_ERRORMSGS

    Do you have a backup from before running repair?

    Do you know how much data was lost by running repair with the allow data loss option? Is the amount of data lost acceptable?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • thanks for the replies guys.

    Lucky enough the db that is corrupt is still in prod/test stage, so it's not a prod database yet.. But it's seems like a good chance to trouble-shoot database corruption.

    I ran the dbcc checkdb('mydb', with no_infomsg, all_errormsgs) and got the following message.:

    Msg 8921, Level 16, State 1, Line 1

    Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

    Thanks to one of the links; this message sorta points to one of the system tables being corrupt.. Strange that the update stats on mydb, gave the initial error message.

    I did restore from a backup 2 days ago, and that had corruption also.. I went 2 weeks back and the db is ok..

    The above message after running checkdb seems confusing.. is it the system table that is corrupt or the db..???

  • Free-292871 (10/12/2009)


    I ran the dbcc checkdb('mydb', with no_infomsg, all_errormsgs) and got the following message.:

    Msg 8921, Level 16, State 1, Line 1

    Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

    This is irreparable corruption. CheckDB requires that the system tables are consistent so that it can use them to tell what should be in the database, where the tables should be, etc. If there's any corruption within the critical system tables then checkDB will fail with this error. It means that checkDB can't even tell what the DB should look like, hence there's no way for it to tell what's ok and what's damaged.

    The only way to get past this error is to restore from a clean backup.

    Strange that the update stats on mydb, gave the initial error message.

    Not at all. The update stats was the first thing to read the affected pages, hence it is the thing that failed. Could be a table that isn't used much or an index that isn't used much. Basically anything can trigger the corruption errors if they try to read damaged pages

    The above message after running checkdb seems confusing.. is it the system table that is corrupt or the db..???

    Yes.

    The database has some corruption. This specific case has that corruption (or at least some of it) within the system tables.

    p.s. You're not done after restoring the backup. Do some root-cause analysis. Have a look through the windows event logs, see if there's anything out of the ordinary around the time that the corruption occurred (sometime between 2 weeks ago and 2 days ago). You need to find the cause to ensure this won't happen again.

    Furthermore, I would suggest scheduling regular integrity checks so that you can pick things like this up early.(doubly important for production databases)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply