The DBA Whoops

  • I've made this mistake before as well.  I've thought about using a template like the one Chopeen showed, but I haven't really brought myself up to speed on transactions.  Is there a crash course article on transactions on this site???

  • I don't use the word whoops -- I use many more colorful words under breath.

    But my biggest savior has always been before I run the update... I run a

    SELECT *

    INTO TableName_YYYYMMDD

    FROM TableName.

    Then do the update. I can then just drop the TableName_YYYYMMDD at YYYYMMDD+10 days.

    But my bosses finally gave me a test SQL Server (So it's only a DELL PE750) after I broke an app trying to install them into production after testing on a dual 333 test box. It took 2 days and living on the the vendors helpdesk to get it fixed.

    But now they appreciate testing with modern equipment.



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • You are so right chopeen...

    Been There, Done That. Our setup around here is better than it once was. The developers has full control of both test and production databases. We had a developer do some major damage to our data 4 weeks in a row and every time she did, the first question I asked was. Did you wrap the query in a transaction? Of course, the answer was always "NO". The first thing I do now when doing any data updates/deletes is type "begin transaction".

    But, when things do go wrong, the best thing to do is

    1.) Stop doing anything else

    2.) Assess the damage (record counts, what it will take to fix it)

    3.) Swallow your pride and admit to both your IS/IT boss and end user management what happened.

    Great Article...

  • Great article. Good advice. Thanks for sharing.

    The only thing worse than noticing the issue right after you hit the Enter key is not noticing the issue. We've had a few of those in my company where the DBA doing the work just doesn't realize that they blew up data or an entire database and we have to find it and recover from it the hard way.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • The only question is : How easy was it after all ? I would've loved to be there to see the face of the guy who it execute .

  • Been there, done that, got the chewed tissue to prove it (since we can't scream around here).  Excellent plan of attack, not only for DBAS but I can see several other areas where this philosophy would pay off at work and at home, too!

  • I do test some things on the test database, however copying the query over and running it on the production database would still have caused the issue. It wasn't that the query wasn't correct, but that the highlighting didn't go as planned. So the correct query was only half executed.

    Now if I just had CTRL-Z working......

    Let's all send THAT one to sqlwish@microsoft.com

  • Can you imagine how much work would go into that one????????

    I vote we learn to be carefull and have BACKUPS.

  • I have done far worse. I am not proud of that, but highly recommend the careful review process you outline. In my case the additional steps of notifying someone would have likely prevented me from making the second mistake. I felt pressed to make the correction quickly and while my problem was related to running a DTS package against the wrong connection, the huried restoration recovery process made the situation worse. I'll spare you the details, but it is key to keep your cool. The steps outlined in this excellent testimonial are sure to help us all.

    Thanks for sharing your experience.

  • Restored with the wrong backup??

  • I've done a DTS table transfer from a staging server to production and misinterpreted what the Include all dependent objects checkbox meant. In this case the production server was at the client's site and they hadn't taken our advice on backups - i.e. they had no backup at all, since the original rollout several months earlier. We very quickly invested in a handy log explorer tool and were eventually able to retrieve about 90% of the data after many heated negotiations and heart palpitations.

  • That would qualify as a Whoops .

  • No I didn't restore the wrong backup. The backup from the previous night was good, but this database is for a high volume transaction based application and I needed to restore the data up to the time of the data loss. My proceedure included backing up the current database even though it had just been messed up with the DTS package (just in case you don't have a good backup that can be restored and you overwrite what you have left). I was in a hurry and missed checking the box to NOT commit the transaction log. IE no transaction data that I could use to restore the data to the point and time of loss. My heart sank. We basically lost 12 hours of transactions. Fortunately the 12 hours where from 10pm to 10am (a low volume period). Once I shot myself in the other foot I basically went through the steps outlined in this article. I restored from the previous nights backup, contacted all our clients and management and made all the necessary provisions to correct our procedures to prevent future occurrences. We now have a HA cluster and multiple TL snapshots throughout the day and a hotsite in another city; so even if this happened again, we would in the abosolute worst case scenario lose 1 hour of data transactions (you have to lose the other site at the same time and screw up the restoration process at the remaining site again to have the worst case).

  • Steve,

    Good Job and Good Advice. I read somewhere that if something of a major proprtion happens a DBA should sit back and relax for a couple of hours before he even starts thinking....

    I had to restore a database as something else and get an individual table once for the application support when they realized the table was corrupted after several days. Worked as well, they went record-by-record.

    I too have a comment about testing. In the regulated environment we usually have a development and QA databases. Not all apps are regulated.

    I try not to copy and paste for the same reason but to save a query and then Open it for Production connection and run, this way you don't really have to click inside the query window.

    But you may test 1000 times and still have cases when something works fine in the development environment, even better in QA environment, it just does not work in Production. That is where you have to relax and get a nice cup-of-tea

    If there anything I can add to the article it is the following:

     - I would code a procedure if you have to do something now and then

    - I would not notify my manager if I just corrupted a table and knew how to fix it. He is not expecting me complaining on something that I can fix. I should notify my manager if allowed downtime is exceeded more then reasonable expectations.

    Yelena

    Regards,Yelena Varsha

  • Well it seems that there are enough sensible replies to this topic already, so here's something that the article reminded me of - the truth about the extinction of the dinosaurs

    Jon.

     

    Doesn't make any sense without the picture!

Viewing 15 posts - 16 through 30 (of 62 total)

You must be logged in to reply to this topic. Login to reply