Insert is taking hours to complete

  • I am running an insert where I'm performing an inner join between two large tables (13 million rows in each). I looked in the Activity Monitor and the process shows my insert as suspended and there is a select on the tempdb table that is running. Is this normal? I'm guessing that the select on the temp table may have something to do with the join? I have indexes on both tables for the join fields. Below is my query:

    Insert into MedParSpecial( CaseID, CoInsuranceAmt, PPSIndicator, DayAdmit, DischargeDate, DischargeDestination, OutlierCodeID,

    FiscalIntermediary,AdmissionToDateOfDeath,ProposedDRGForNextYear,BloodDeductable,CapitolTotalAmt,DisproportionateShareAmt,

    IndirectMedicalEducationAmt,InpatientDeductable,OutlierAmt,PrimaryPayorAmount,WardDays)

    Select p.CaseID,t.coins_amt,t.pps_ind,t.admitdate,t.dischdate,t.dischdest,t.outlier_co,t.inter_nbr,t.adm_dod,t.prop_drg,t.blood_ded,

    t.cap_tot_am,t.dis_sh_amt,t.ime_amt,t.inpat_ded, t.outl_amt, t.pri_pay_am, t.warddays

    from dbo.temp_medpar2007_hosp_costedcsv t

    inner join CASES p on t.ID = p.tmp_OldPrimeKey

  • dndaughtery (3/26/2009)


    I am running an insert where I'm performing an inner join between two large tables (13 million rows in each). I looked in the Activity Monitor and the process shows my insert as suspended and there is a select on the tempdb table that is running. Is this normal? I'm guessing that the select on the temp table may have something to do with the join? I have indexes on both tables for the join fields. Below is my query:

    Insert into MedParSpecial( CaseID, CoInsuranceAmt, PPSIndicator, DayAdmit, DischargeDate, DischargeDestination, OutlierCodeID,

    FiscalIntermediary,AdmissionToDateOfDeath,ProposedDRGForNextYear,BloodDeductable,CapitolTotalAmt,DisproportionateShareAmt,

    IndirectMedicalEducationAmt,InpatientDeductable,OutlierAmt,PrimaryPayorAmount,WardDays)

    Select p.CaseID,t.coins_amt,t.pps_ind,t.admitdate,t.dischdate,t.dischdest,t.outlier_co,t.inter_nbr,t.adm_dod,t.prop_drg,t.blood_ded,

    t.cap_tot_am,t.dis_sh_amt,t.ime_amt,t.inpat_ded, t.outl_amt, t.pri_pay_am, t.warddays

    from dbo.temp_medpar2007_hosp_costedcsv t

    inner join CASES p on t.ID = p.tmp_OldPrimeKey

    Hi,

    First, the tempdb select that is occuring is probably because the server does not have enough memory, so it had to swap to the tempDB to do the join. This happens when you do huge queries, with too low memory on the machine.

    Why do you need to insert 13 M rows in another table? (I consider the inner join will work between those 2 tables)

    What are the indexes on these tables (please tell me both clustered indexes match for the join)

    I'll try to find a piece of code I saw to insert in batches in another table, it will prevent from swapping on the tempDB, and will increase the performance you are experiencing, but for now, try to post your index definitions of both tables joined.

    Cheers,

    J-F

  • Edited : Added Florian's worship!

    Try this, see what it does for you, tell me if it's any better.

    Couldn't find my piece of code so I "Stole" Florian's, I hope he does not go mad, 😉

    DECLARE @batch INT

    DECLARE @all INT

    DECLARE @current INT

    SELECT @batch = 100,

    @all = 0,

    @current = 0

    WHILE (1 = 1)

    BEGIN

    INSERT INTO MedParSpecial

    (CaseID,

    CoInsuranceAmt,

    PPSIndicator,

    DayAdmit,

    DischargeDate,

    DischargeDestination,

    OutlierCodeID,

    FiscalIntermediary,

    AdmissionToDateOfDeath,

    ProposedDRGForNextYear,

    BloodDeductable,

    CapitolTotalAmt,

    DisproportionateShareAmt,

    IndirectMedicalEducationAmt,

    InpatientDeductable,

    OutlierAmt,

    PrimaryPayorAmount,

    WardDays)

    SELECT TOP ( @Batch ) p.CaseID,

    t.coins_amt,

    t.pps_ind,

    t.admitdate,

    t.dischdate,

    t.dischdest,

    t.outlier_co,

    t.inter_nbr,

    t.adm_dod,

    t.prop_drg,

    t.blood_ded,

    t.cap_tot_am,

    t.dis_sh_amt,

    t.ime_amt,

    t.inpat_ded,

    t.outl_amt,

    t.pri_pay_am,

    t.warddays

    FROM dbo.temp_medpar2007_hosp_costedcsv t

    INNER JOIN CASES p

    ON t.ID = p.tmp_OldPrimeKey

    SELECT @current = @@ROWCOUNT,

    @all = @all + @@ROWCOUNT

    IF (@current = 0)

    BREAK

    END

    PRINT 'Did: '

    + CONVERT(VARCHAR(10),@all) + ' Rows'

    Cheers,

    J-F

  • Well, the database file is sitting on a 1.5 terabyte drive all by itself and it hasn't taken up 10% of the space available.

    Why do you need to insert 13 M rows in another table? (I consider the inner join will work between those 2 tables)

    I had to import a large txt file so I loaded it into a table that I will remove later.

    What are the indexes on these tables (please tell me both clustered indexes match for the join)

    Yes, the one for the table that I will remove has a clustered index but the other tabler I'm joining to has a nonclustered index on the old ID (Matches the other ID). I am of the understanding that I can only have one clustered index per table?

    I'll try to find a piece of code I saw to insert in batches in another table, it will prevent from swapping on the tempDB, and will increase the performance you are experiencing, but for now, try to post your index definitions of both tables joined.

  • Thanks for the code but I'm curious as to how it works. Will it not just keep importing the top 100 everytime? I don't see how it will go past the first 100 rows.

  • dndaughtery (3/26/2009)


    Thanks for the code but I'm curious as to how it works. Will it not just keep importing the top 100 everytime? I don't see how it will go past the first 100 rows.

    My bad, you are so right. Lemme get a coffee and fix this. I'll add a where not exists.

    I shouldn't post stuff early in the morning...

    Cheers,

    J-F

  • DECLARE @batch INT

    DECLARE @all INT

    DECLARE @current INT

    SELECT @batch = 100,

    @all = 0,

    @current = 0

    WHILE (1 = 1)

    BEGIN

    INSERT INTO MedParSpecial

    (CaseID,

    CoInsuranceAmt,

    PPSIndicator,

    DayAdmit,

    DischargeDate,

    DischargeDestination,

    OutlierCodeID,

    FiscalIntermediary,

    AdmissionToDateOfDeath,

    ProposedDRGForNextYear,

    BloodDeductable,

    CapitolTotalAmt,

    DisproportionateShareAmt,

    IndirectMedicalEducationAmt,

    InpatientDeductable,

    OutlierAmt,

    PrimaryPayorAmount,

    WardDays)

    SELECT TOP ( @Batch ) p.CaseID,

    t.coins_amt,

    t.pps_ind,

    t.admitdate,

    t.dischdate,

    t.dischdest,

    t.outlier_co,

    t.inter_nbr,

    t.adm_dod,

    t.prop_drg,

    t.blood_ded,

    t.cap_tot_am,

    t.dis_sh_amt,

    t.ime_amt,

    t.inpat_ded,

    t.outl_amt,

    t.pri_pay_am,

    t.warddays

    FROM dbo.temp_medpar2007_hosp_costedcsv t

    INNER JOIN CASES p

    ON t.ID = p.tmp_OldPrimeKey

    LEFT JOIN MedParSpecial s on p.CaseID = s.CaseID

    WHERE s.CaseID is null

    SELECT @current = @@ROWCOUNT,

    @all = @all + @@ROWCOUNT

    IF (@current = 0)

    BREAK

    END

    PRINT 'Did: '

    + CONVERT(VARCHAR(10),@all) + ' Rows'

    What kind of performance do you get with this query? Run it just once (for let's say 1000 rows), and post the plan here, see if the left join did not ruin the performance of the query.

    Cheers,

    J-F

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply