Deleting with a Foreign Key

  • I have a table (tbl_Parent) that I currently delete from based on a pretty complex where clause.  tbl_Parent holds about 50M rows so I need to be sure that my delete statement is as optimized as possible.  I need to add records into my table that will match the criteria for the delete but I don't want them deleted.  I do need to store a little extra information on these records so I need to create a table that references these records (tbl_Child with a Foreign Key to tbl_Parent).

    I don't want my delete statement to do this:

    DELETE FROM tbl_Parent WHERE ParentID NOT IN (SELECT ParentID FROM tbl_Child)

    due to efficiencies.

    I was hoping that I could use SET XACT_ABORT and ANSI_WARNINGS to simply delete from the parent table and when there is a record in the child table, ignore it for the delete.  Is there any way that I can do this?  Or is there any other way that someone can think of to set this up. 

    I'd prefer not to add any more columns to my table and if I have to I can use another column that references a batch and set these override records as Batch -100 and then use the clause

    WHERE BatchID > -100

    but I'm not sure which is most efficient. 

  • SET XACT_ABORT will rollback all your previous deletes as well, even rows that you want to delete - so that won't work.

    I don't think ANSI_WARNINGS will work either.

    I would expect using BatchID will be more efficient, as SQL will have less IO's to perform.  If you want to know for sure, set up both conditions on your development server and time the deletes for both conditions.

     


    When in doubt - test, test, test!

    Wayne

  • In the past I have found a quicker way is to use exists:

    DELETE FROM tbl_Parent

    WHERE NOT EXISTS (

    SELECT ParentID

    FROM tbl_Child

    WHERE tbl_Parent.ParentID = tbl_Child.ParentID )


    ------------------------------
    The Users are always right - when I'm not wrong!

  • Thanks Wayne!

    We don't really have a development server so it's all on the fly and the process runs overnight and the business doesn't take too well when it goes down because they rely heavily on it.

    Thanks for your help.  I'll see what I can figure out.

  • Sounds like the perfect reason to lobby for a dev box! How are you supposed to make sure you don't cause down time if you don't have the ability to test out your code first? This is a VERY bad business practice if you ask me!

    At the very least have them make a copy of the database on your server so that you can do the testing in a different DB!




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • Maybe You could split the problem somehow.

    DECLARE @ToBeDeleted TABLE( Id int not null primary key clustered&nbsp

    INSERT @ToBeDeleted( Id )

    SELECT ... Query to Resolve the rows to be deleted ....

    DELETE tbl_Parent

    FROM @ToBeDeleted

    WHERE tbl_Parent.Id = @ToBeDeleted.Id

    ... ... ...

    /rockmoose


    You must unlearn what You have learnt

  • Actually 50M rows is not the end of the world for SQL Server. No doubt there is PK on ID column in tbl_Parent.

    May be problem is in your "pretty complex where clause".

    So you need create table ToDelete (ParentId int) and insert into this table IDs from parent table which match the criteria.

    Than you delete from ToDelete IDs from tbl_Child.

    And than delete from tbl_Parent where ID in (select ParentId from ToDelete)

    If you still afraid to start delete from tbl_Parent you can do this:

    delete from ToDelete

    where ParentId not in (select TOP 1000 ParentId from ToDelete).

    Or even better:

    create table ToDeleteToday (ParentId int) ,

    INSERT ToDeleteToday select ParentId from ToDelete

    where ParentId in (select TOP 1000 ParentId from ToDelete)

    delete from ToDelete

    where ParentId in (select TOP 1000 ParentId from ToDelete).

    It will leave only 1000 rows to delete and it will take definitely less than 8 hours to go. And you can see how long it actually takes and correct the number of rows.

    Next night you'll repeat this query and delete next 1000 (10k, 100k) rows.

    _____________
    Code for TallyGenerator

  • Thanks everyone for their input, I am going to try a bunch of options.  Since this is an overnight process that crunches and rebuilds all the data, it's going to take me a week of change and analysis to see what works best.  It's all part of the fun.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply