INSERT INTO x SELECT y FROM z WHERE...

  • Hi!

    I'm developing an time tracking and project management software.

    For the time tracking feature all tasks are stored in a table called 'Task'.

    A record in this table is on average 150 bytes.

    The growth per year is calculated to 100 000 - 200 000 records.

    Currently I'm implementing a function to make it possible for the project managers to create working copies. That is, you can create working copies from the real projects plan.

    When a new working copy is created, approximately 50 000 - 100 000 records from the real projects plan is copied and a field in the 'Task' table is set to the Id of the new working copy.

    My concern is the stored procedure that perform this action.

    The current construction is:

    CREATE PROCEDURE dbo.spI_Task_WorkingCopy

    (

    @WorkingCopyID int,

    @dtmDateCriteriaBegin datetime,

    @dtmDateCriteriaEnd datetime

    )

    AS BEGIN

    SET NOCOUNT ON

    DECLARE @Return int

    BEGIN TRANSACTION

    INSERT INTO [dbo].[Task]

    (

    [Name],

    [ProjectID],

    [Description],

    [ProcentDone],

    [TaskType],

    [Start],

    [TimeEstimated],

    [TimeReal],

    [Deadline],

    [Priority],

    [Milestone],

    [RestrictionType],

    [RestrictionDate],

    [Notes],

    [OwnerPersonID],

    [MasterTaskID],

    [Status],

    [WorkingCopyID]

    )

    SELECT

    [Task].[Name],

    [Task].[ProjectID],

    [Task].[Description],

    [Task].[ProcentDone],

    [Task].[TaskType],

    [Task].[Start],

    [Task].[TimeEstimated],

    [Task].[TimeReal],

    [Task].[Deadline],

    [Task].[Priority],

    [Task].[Milestone],

    [Task].[RestrictionType],

    [Task].[RestrictionDate],

    [Task].[Notes],

    [Task].[OwnerPersonID],

    [Task].[MasterTaskID],

    [Task].[Status],

    @WorkingCopyID

    FROM [dbo].[Task]

    INNER JOIN [dbo].[Project] ON [dbo].[Project].[ProjectID]=[dbo].[Task].[ProjectID]

    INNER JOIN [dbo].[UserProfile] ON [dbo].[Task].[OwnerPersonID]=[dbo].[UserProfile].[PersonID]

    WHERE [dbo].[Task].[Start] BETWEEN @dtmDateCriteriaBegin AND @dtmDateCriteriaEnd

    AND [dbo].[Task].[WorkingCopyId]= '0'

    AND [dbo].[Task].[Status] = 'Preplanned'

    AND [dbo].[Project].[Status]= 'Active'

    AND ([dbo].[UserProfile].[UserStatus] = 'Active' OR [dbo].[UserProfile].[UserStatus] = 'JohnDoe' OR [dbo].[UserProfile].[UserStatus] = 'Missing')

    IF @@ERROR <> 0 GOTO ERR_HANDLER

    SET @Return = @@RowCount

    COMMIT TRANSACTION

    SET NOCOUNT OFF

    RETURN(@Return)

    ERR_HANDLER:

    SELECT N'Unexpected error occured creating working copy'

    ROLLBACK TRANSACTION

    SET NOCOUNT OFF

    RETURN 0

    END

    The real projects plan has WorkingCopyId = 0.

    TaskID is the primary key. No other fields are indexed right now.

    I've read that problems can occur with the transaction log due to the fact that everything is run as one transaction, which means that the log isn't truncated until everything has finished.

    I've understand that it isn't possible to minimize the logging for INSERT INTO operations.

    The solution is said to be a break down of the operation in smaller batches. How do I solve this, and how large batches should I choose? 1000 records per batch, 10 000 records per batch?


    Best Wishes,
    Stefan Johansson

  • This was removed by the editor as SPAM

  • Do you expect to be running on good, contemporary hardware?

    If so, I think I'd just try a 100K row operation all in one go and see what happens.  You could at least do that in test.  If it works OK without batching, then don't worrry about it.

    You may have concurrency issues with other updaters, too, so arrange to have some typical interactive workload going when you do a second test.

    If this is for an in-house application only, where you'll be in touch with the users, you could be a tad casual about the performance and impact of the copy create and just ask the users to do this copy create when the system isn't busy.

    If this is for a commercial product, more rigorous standards should apply.  Even so, we've bought stuff that didn't work all that well and accepted that we had to be cautious with mass updates.

    If you do decide to do batching, you can't just rollback to undo the transaction, so more programming work will be required.  You'll probably have to set a flag somewhere that says a copy create was in progress, in case the system dies while you're in the middle of it and it can be programmatically reversed - or carried forward - when the server comes back up.

  • Hey!

    Thanks for the reply, and sorry for the delay.

    Yep, the hardware has great performance.

    I have tried some scenarios and this "massive" insert into operation is working pretty good. However, I have abandoned this solution. Instead of copying all records , I'm using a "reference" table with three columns: ReferenceId, WorkingCopyId and TaskId. When a new working copy is created from the default working copy this reference table is filled with references to the records of the default working copy.

    When a record in a working copy is edited, a new record is inserted and the same if a record is edited in the default working copy and is present in at least one working copy, a new record is also inserted.

    This way, I think the amount of data needed to be handled is minimized.

    This is for an in-house application, but nevertheless the users shouldn't have to care about the system load.

    Thanks for the help!


    Best Wishes,
    Stefan Johansson

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply