batch insert question- One Transaction?

  • I'm trying to improve our web app's insert operations by switching to a batch insert.

    Searching this forum has been helpful as there are various threads that are very informative. A couple of questions remain in my mind, though. Will a batch insert as I've illustrated below result in a single transaction? Are there other ways to describe the performance benefit? And finally, can all these rows be removed through a single rollback?

    Old, inefficient multiple inserts:

    INSERT INTO sdj_test_table (ID, NAME) VALUES (-1006,' Bad Debt [CARRIER]')

    INSERT INTO sdj_test_table (ID, NAME) VALUES (-1002,' Blue Cross [CARRIER]')

    INSERT INTO sdj_test_table (ID, NAME) VALUES (-1003,' Commercial [CARRIER]')

    And new-and-improved the batch insert:

    INSERT INTO sdj_test_table (ID, NAME)

    SELECT -1006,' Bad Debt [CARRIER]'

    UNION ALL

    SELECT -1002,' Blue Cross [CARRIER]'

    UNION ALL

    SELECT -1003,' Commercial [CARRIER]'

    Thoughts?

  • seth 98372 (6/21/2011)


    ...

    And new-and-improved the batch insert:

    INSERT INTO sdj_test_table (ID, NAME)

    SELECT -1006,' Bad Debt [CARRIER]'

    UNION ALL

    SELECT -1002,' Blue Cross [CARRIER]'

    UNION ALL

    SELECT -1003,' Commercial [CARRIER]'

    Will a batch insert as I've illustrated below result in a single transaction?

    Yes.

    Are there other ways to describe the performance benefit?

    1 transaction versus 3, 1 parse, versus 3, 1 optimization versus 3...1 will be faster.

    And finally, can all these rows be removed through a single rollback?

    Yes. A single INSERT is by nature transactional. If the INSERT fails all work is rolled back.

    If you're on 2008 you may want to consider the row constructor:

    INSERT INTO sdj_test_table

    (ID, NAME)

    VALUES (-1006, ' Bad Debt [CARRIER]'),

    (-1002, ' Blue Cross [CARRIER]'),

    (-1003, ' Commercial [CARRIER]') ;

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • This sounds suspiciously like a homework assignment. Why don't you tell us your thoughts?

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (6/21/2011)


    This sounds suspiciously like a homework assignment. Why don't you tell us your thoughts?

    Har-Har! I wish this were a homework assignment... This is me at my job trying to improve our app's performance. My boss prides himself as an SQL expert, and he's put me in the position of validating the solution I've come up with. He's also the one who questioned me on the rollback-ability of this type of insert.

    We're using a proprietary development platform (created in-house), and it has all these mechanisms built-in for doing selects, updates, and inserts. It has some nice benefits in that it is schema-aware, so when the tables change, the DDL operations automagically adapt.

    But one of the traditional conventions is for developers to do an SQL query, loop through each returned row, then do something with values from each row. Unfortunately, the "do something" with the values is frequently INSERTS into other tables. This can be several tens of thousands of rows. Performance is pretty ugly in many aspects of our app, and I'm trying to clean that up. The looping inserts is low-hanging fruit from what I can tell.

    I've written a proof-of-concept that demonstrates doing the single bulk insert works, but I don't have a way to quantify the improvement to my skeptical boss. There's resistance in the organization because to accomplish this, I've had to use the DB Query mechanism which the creator intended to only query rows from the DB. I've built up an SQL batch insert statement within a string variable, and I'm passing it to the DB Query mechanism with no rows to be returned. They (my boss, essentially), sees this usage of the DB Query mechanism as a sacrilege and doesn't want it to be used that way. The other option would be for the developers to create an entire separate mechanism where you can pass variables containing SQL statements to the DB. That would require hours of development.

    So, rather than this being a simple homework assignment, it's an internal political battle. I'm trying to load the cannon up with as much sense as I can possible fire in the face of the 'resistance.'

  • Thank you for your corroboration of what I suspected. I'm going to print out your response and slam it on my boss's desk and shout, "SEE! OPC.THREE says it will work!" (see my other post in this thread)

    But seriously, thanks for validating this suspicion of mine.

  • Heh, yeah, tell 'em I sent ya 😉

    Out of curiosity brought about by something you said about "bulk", what's the client language? There may be a better way than any of the T-SQL methods we've kicked around so far.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/21/2011)


    Heh, yeah, tell 'em I sent ya 😉

    Out of curiosity brought about by something you said about "bulk", what's the client language? There may be a better way than any of the T-SQL methods we've kicked around so far.

    The environment has scripting options that are similar to java. But we're not loading from a file or anything. It's dynamically pulling records from some tables, executing some logic, then performing inserts on other tables. What 'better way' did you have in mind?

    Appreciatively,

    Seth

  • .NET class System.Data.SqlClient.SqlBulkCopy

    It allows one to load data directly from an ADO.NET data structure into a SQL Server database via the bulk copy API (same one used by bcp I think) so it's blazing fast.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/22/2011)


    .NET class System.Data.SqlClient.SqlBulkCopy

    It allows one to load data directly from an ADO.NET data structure into a SQL Server database via the bulk copy API (same one used by bcp I think) so it's blazing fast.

    Thanks for that info. We're using Java with no .NET stuff. It could be a good suggestion for a future explorer of this thread, though.

    Thanks again for validating my case here.

    Seth

  • Anytime... Fight the good fight 🙂

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Har-Har! I wish this were a homework assignment... This is me at my job trying to improve our app's performance. My boss prides himself as an SQL expert, and he's put me in the position of validating the solution I've come up with. He's also the one who questioned me on the rollback-ability of this type of insert.

    ...

    But one of the traditional conventions is for developers to do an SQL query, loop through each returned row, then do something with values from each row. Unfortunately, the "do something" with the values is frequently INSERTS into other tables. This can be several tens of thousands of rows. Performance is pretty ugly in many aspects of our app, and I'm trying to clean that up. The looping inserts is low-hanging fruit from what I can tell.

    ...

    Processing row by row with an insert in between is the text book definition of a poorly optimized database process. The most optimal solution for selecting 10,000 rows from a table, doing stuff with the values, and then inserting into another table would be a single insert / select like the example below. It's all one transaction, and if anything goes wrong, like a foreign key or check constraint violation, then the entire insert is rolled back.

    insert into TableC ( col1, col2, col3 )

    select (n + 2) col1, (b + 'XYZ') col2, (case when ... then ...) col3

    from TableA ...

    join TableB ...

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • He's going from a Java data structure to a SQL table, not a SQL table to another SQL table.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Eric M Russell (6/23/2011)


    ...

    Processing row by row with an insert in between is the text book definition of a poorly optimized database process. The most optimal solution for selecting 10,000 rows from a table, doing stuff with the values, and then inserting into another table would be a single insert / select like the example below. It's all one transaction, and if anything goes wrong, like a foreign key or check constraint violation, then the entire insert is rolled back.

    insert into TableC ( col1, col2, col3 )

    select (n + 2) col1, (b + 'XYZ') col2, (case when ... then ...) col3

    from TableA ...

    join TableB ...

    Eric,

    Thank you for your thoughtful response here. OPC.Three is right, though, we're doing the initial SQL Query in java, then processing the results (in java), and per each result doing an insert into separate tables.

    Hopefully your insight will be helpful to others who come across this thread.

    Appreciatively,

    Seth

  • seth 98372 (6/23/2011)


    Eric,

    Thank you for your thoughtful response here. OPC.Three is right, though, we're doing the initial SQL Query in java, then processing the results (in java), and per each result doing an insert into separate tables.

    Hopefully your insight will be helpful to others who come across this thread.

    Appreciatively,

    Seth

    In that case, your original idea of doing something like INSERT INTO ... SELECT ... UNION ALL SELECT ..., where all rows are contained within the single insert statement, would almost certainly be the best method. If that Java data structure thing has functionality to synch data changes back to the database, then that would be worth looking into as well.

    As for convincing your boss of the permormance merits of one approach versus the other, you can run a SQL Profiler trace, output to a table, and compare the aggregate CPU, Read, and Write totals for all the individual insert statements versus the same for the single consolidated insert.

    http://msdn.microsoft.com/en-us/library/ms175848.aspx

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply