Corrupt Disk Nightmares ... Help!

  • Wrun SQL Server 2000 Enterprise Manager on Windows 2000 and Windows 2003.  The last month has been a nightmare as local and EMC remote disks have gone corrupt on us.   We do data warehousing so our DB is over 500 Gig.  When we load we load MBs.  Is there a virus or issues that could be causing our disk problems that I should know about? 

    Thanks in advance

     

  • I'd contact EMC and work on this.

  • The questions are:

    1. You say your Enterprise Manager is on Windows 2000 and Windows 2003. But where is your SQL Server installed? Enterprise Manager is a front-end client tool. Where is SQL Server?

    2. Did you backup your databases? SQL Server keeps database files open and they normally would not go to Windows disk backup. You need to use SQL Server BACKUP utility or BACKUP DATABASE statements in the scripts to get SQL databases backup files on the data storage and then Windows backup will put those files on tape or so. I did not work with EMC storage, I am telling this from my experience with regular drives.

    Another option is to stop SQL Server, backup database by Windows Backup because database files are closed when SQL Server is stopped. Then restart SQL Server.

    Could it be that your database files did not go to the backup tape at all because of #2 ?

    Regards,Yelena Varsha

  • Are your databases corrupt or your disks?

    If you disks are corrupt, it could be your controller (I assume this is a SAN?) ... I agree contact EMC if this is the issue.

    If your database is corrupt that is a different ball of wax, but it could still be related to the disks. Also SQL Server can have issues where it writes too slowly for high speed SANs, and there is a hot fix for this.

  • Questions 1-3 really were a bit off the mark.  Question/stmt 4 hit the mark.  It is disk corruption where SQL hits EMC.

    1. Where is your SQL Server installed? Enterprise Manager is installed on the servers.  We are not permitted to manage from our desk pcs.  We telnet or citrix to servers (1) and manage all server (7) from there in most cases.

    2. Did you backup your databases? We backup with BACKUP and RESTORE (SQL Servers).   We have a large DB so stripe it across drives.

    3.  Could it be that your database files did not go to the backup tape at all because of #2 ?  No.  We are getting DISK CORRUPTION Errors.   They put new disks on and we restore --- I'm just tired of the corruption and the need to restore.

    4.  It could be your controller (I assume this is a SAN?) We do use SAN.

    Thanks.   We've recoved so all is well, though we never know whan a disk issue will arise again as we do not know what caused the many disk issues we've hit.

    Thanks Again

  • I'm very surprise you have problem with EMC.

    I'm assuming using  you have 0 +1 if SAN is in place. If not, you may consider pursue this.

    If 0 + 1 cause you problem, you need talk to EMC to see what's went wrong ? 

    Here is what I found problemmatic with SQL when SAN solution is used,

    1. the controller may have tons of cache but remember this cache is external to the disk enclosure. If your controller had problem, think what happen to your data that still sit in the cache that are not commit to the database ?

    2. Is your controller aslo mirrored ?

    3. What's the stripe size on your EMC ?  what's your stripe size for your OS  ?

    Today's DBA is not a pure DBA anymore, you have to know everything from network, system, hardware, middleware and some programming language that your company use.  

  • I've been running my db's on a Hitachi SAN for about 2 yrs and have had 2 corruptions, the first caused by 1 disk failing and the controller failing to re-build the stipe to a fail over disk. The second wasn't actually a corruption, but the drive became dismounted from the system and after numerous hours with Microsoft support (Hitachi did the blame another vendor thing). I was able to reattach the disks. And it appears that having 2 mounting points in succession (I used a O: and a P can cause issues and it is better to mount with some skips incase of virtual disk mounting issues.

    Anyway I never found a real cause for the issues and Hitachi has been very hush hush about the problems. I do not belive they are related to SQL server. With mine we registered a network spike at one point before the disks dismounted so we have a network line conditioner installed but other than that I just hope and pray.

    So I guess it just goes to show the value of tape backups.

  • Thanks.   I can tell you we are raid 5 but that is it.  I'll check the cache.  Thanks Again

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply