There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction

  • RBarryYoung (4/14/2009)


    Christian Buettner (4/14/2009)


    I'll just jump in and risk making a fool of myself.

    In one of my data import apps (Oracle to SQL Server), I use temp tables with varchar(500) fields to ensure all data can be imported successfully to the local server (via DTS). The data from the temp tables is then merged into the final tables using T-SQL (delete / insert) statements.

    Since the source data can sometimes contain "invalid data" (e.g. primary key violations or field size too big), the import sometimes fails and this is where my cursor solution comes into play. It imports all rows from the temp table into the final table one by one. Each row that fails is imported into the corresponding error table. This allows correction or just validation of the invalid data lateron and ensures that correct data is available to all clients that were not affected. (Note: the first try is always set based - only in case of an error, the cursor based approach starts)

    Wondering if someone has a hint on a better way to do this.

    If you need more rationale for this, let me know.

    I use BULK INSERT, BCP or Integration Services(SSIS) for stuff like this. They have this capability built-in.

    RBarryYoung, can you please explain how you would route the rows among the error and final table using BCP or BULK INSERT? SSIS is not an option which I forgot to mention. Still using SQL Server 2000 & DTS.

    Best Regards,

    Chris Büttner

  • Christian Buettner


    In one of my data import apps (Oracle to SQL Server), I use temp tables with varchar(500) fields to ensure all data can be imported successfully to the local server (via DTS). The data from the temp tables is then merged into the final tables using T-SQL (delete / insert) statements.

    Since the source data can sometimes contain "invalid data" (e.g. primary key violations or field size too big), the import sometimes fails and this is where my cursor solution comes into play. It imports all rows from the temp table into the final table one by one. Each row that fails is imported into the corresponding error table. This allows correction or just validation of the invalid data lateron and ensures that correct data is available to all clients that were not affected. (Note: the first try is always set based - only in case of an error, the cursor based approach starts)

    Wondering if someone has a hint on a better way to do this.

    [...]

    Christian Buettner (4/14/2009)


    Can you please explain how you would route the rows among the error and final table using BCP or BULK INSERT? SSIS is not an option which I forgot to mention. Still using SQL Server 2000 & DTS.

    [font="Verdana"]Christian,

    We do similar stuff here on a variety of platforms, one of which is SQL Server 2000. Here's the approach I would recommend.

    1. truncate (or create) your loading table and your error table. These tables should be able to hold all of the data without generating errors, but ithey needs to split the data into the right fields. We generally add a line number field (using an identity) and a load date field (with a default of getdate()).

    2. Use BCP or bulk insert to load the raw data into a load table. Sadly you can't use the XML file format option in SQL Server 2000 (which I find much easier to work with.)

    3. Select out the data that is problematic and insert it into an "error table" (probably the same format as the load table but without the identity). We found we are doing this step with dynamic SQL to build up the list of rows that are errors, based on a set of stored rules that get concatenated.

    4. Insert the remaining data (so you can left join to the error data by the line number and ensure that the right hand side is null) into the destination table.

    No cursors needed. 😀 For steps 3 and 4, you can also put loops around the statements and use TOP to keep the batch sizes down to reasonable sizes (so you don't blow out the transaction logs).

    If you want some additional help on this, just create a new thread and I'm sure a bunch of people will be able to pitch in with specific examples.

    [/font]

  • I300I (4/14/2009)


    The author makes the claim several times that there is no reason to ever use a cursor again with SQL Server 2005, because everything can be done with set-based logic. When the very realistic question is posed about calling sp_send_dbmail, we see no answer that supports the claim. Notification Services?, SQL CLR?, RS? These are the solutions to avoid a simple looping process to send emails?

    I am interested in seeing this problem addressed without a looping process:

    I ship a packaged DB product to our customers. I do not use NS or SQL CLR. I am not on site to administer their systems.

    I have a set of 50 pre-defined status messages in a message table. Each row in the message table defines a message, with all the execution parameters for a call to sp_send_dbmail. A message execution schedule is also included, as well as the last execution date/time, and results of the last sp_send_dbmail procedure execution for each message.

    The stored procedure is kicked-off by an hourly scheduled job, but only those messages that are due to be sent are sent. The job executes in less than 5 seconds.

    Sure, I can avoid a cursor by creating a temp table, loading all the rows that are due to be sent, WHILE-LOOPING through each row to send a message and capture the proc result, then updating my message table with the results. This is a lot more code than using a basic fast_forward cursor, and there is no performance issue with such a small amount of work to do.

    I am making the claim that as of SQL Server 2005 and greater we do not need explicit Cusrors or WHILE loops because there are performant alternatives to accomplish the same things. Myself and others have already offered multiple solutions for the general Email problem:

    1) Distribution Lists

    2) Notification Services

    3) Aggregated Execution (query built Dynamic SQL)

    4) SQL CLR

    5) External Activator (SQL Client executor as a Windows Service)

    Off-hand I can think of some more:

    6) SSIS

    7) Query built xp_CmdShell SMTP commands

    8) SQLCMD to SMTP batch

    Depending on the specifics of your functional specifications, there are other possibilities also.

    These are all solutions accessible in SQL Server 2005 and perfectly valid solutions that do not involve Cursors. I also know for a fact that there are shipping SQL Server using products that have also used each of the first five techniques above. And they were not on-site to manage their customers systems either.

    If you want to provide us with the relevant CREATE TABLE definitions and an example of your cursor procedure, I am sure that we can demonstrate one or more of these to you.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Joe Celko (4/14/2009)


    LOL! I loved that intro!

    ...

    Thanks, Joe! It's really great to here that from someone whose books played such a big part in my development as a SQL professional.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Bruce W Cassidy (4/14/2009)


    [font="Verdana"]An interesting start, and I'll look forward to reading more.

    Thanks Bruce, I look forward to your future reviews. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Christian Buettner (4/14/2009)


    RBarryYoung (4/14/2009)


    Christian Buettner (4/14/2009)


    I'll just jump in and risk making a fool of myself.

    In one of my data import apps (Oracle to SQL Server), I use temp tables with varchar(500) fields to ensure all data can be imported successfully to the local server (via DTS). The data from the temp tables is then merged into the final tables using T-SQL (delete / insert) statements.

    Since the source data can sometimes contain "invalid data" (e.g. primary key violations or field size too big), the import sometimes fails and this is where my cursor solution comes into play. It imports all rows from the temp table into the final table one by one. Each row that fails is imported into the corresponding error table. This allows correction or just validation of the invalid data lateron and ensures that correct data is available to all clients that were not affected. (Note: the first try is always set based - only in case of an error, the cursor based approach starts)

    Wondering if someone has a hint on a better way to do this.

    If you need more rationale for this, let me know.

    I use BULK INSERT, BCP or Integration Services(SSIS) for stuff like this. They have this capability built-in.

    RBarryYoung, can you please explain how you would route the rows among the error and final table using BCP or BULK INSERT? SSIS is not an option which I forgot to mention. Still using SQL Server 2000 & DTS.

    Check out the ERRORFILE argument of the BULK INSERT command, that should explain it.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (4/14/2009)


    Check out the ERRORFILE argument of the BULK INSERT command, that should explain it.

    [font="Verdana"]Unfortunately that doesn't quite work as you might expect. For many errors, the load will be aborted. Also, generally it's best to capture the data that has errors within the database (to allow for reporting, correction, etc).

    I had a grand plan that involved using bulk insert, generating an error file, then bulk inserting the error file into another table. But from memory I ran into issues with the load aborting, some errors not being reported correctly, the data for the failed row not being copied to the error file correctly, bulk insert aborting a whole batch on an error, etc, etc.

    Shame. This is one area where I miss SQL*Loader from Oracle.

    [/font]

  • I agree with Brad. This article was condescending.

    While I do agree that we should avoid the use of cursors wherever possible, saying that developers that use them are lazy or don't care about the quality of their code, etc is just poor form.

    A better way to have written the article would be something along the lines of this:

    "While cursors are common way to do X, there is a better way to accomplish the desired result. If we do Y, then we can arrive at X much quicker and simpler".

    There's no need to criticize developers or their code. That will only put them on the defensive. That's going to lead to them being less receptive of really hearing the important part of the message... how to improve the code!

  • RBarryYoung (4/14/2009)


    I am making the claim that as of SQL Server 2005 and greater we do not need explicit Cusrors or WHILE loops because there are performant alternatives to accomplish the same things. Myself and others have already offered multiple solutions for the general Email problem:

    1) Distribution Lists

    2) Notification Services

    3) Aggregated Execution (query built Dynamic SQL)

    4) SQL CLR

    5) External Activator (SQL Client executor as a Windows Service)

    Off-hand I can think of some more:

    6) SSIS

    7) Query built xp_CmdShell SMTP commands

    8) SQLCMD to SMTP batch

    Depending on the specifics of your functional specifications, there are other possibilities also.

    Just curious, has anyone benchmarked Aggregated Execution (query built Dynamic SQL) against a cursor loop? Seems to me there would be considerable overhead involved in building and then parsing the potentially large string and possibly hitting a hidden internal looping mechanism anyway (buffer up input statement until either end-of-string or crlf reached, execute buffer, repeat if more string left). And, when building the string there's the aforementioned overhead needed to convert potential embedded apostrophes to the escape sequence ('') if an sp is being called with retrieved string data.

    Also wondering if ownership chaining is still an issue with dynamic sql. Pretty sure I can write an sp that wraps sp_dont_touch_this(@someargument varchar(25)) in a cursor loop and grant execute on it to someuser without having to grant execute on sp_dont_touch_this to someuser. Is the same still not true for dynamic SQL in SQL 2005?

  • Barry is clearly not attempting to be condescending in the article. He lists reasons he has seen and experienced for why people use cursors. He also clearly states that pre-2005 there were reasons to use cursors.

    I look forward to the later articles where examples are provided as well.

  • I hate cursors...hence I like this article.

    I have stated my opinion a few times on SSC "Cursors are only useful if you don't know how to write SQL".

    to anyone who takes the authors comments personally: I'm sure he blurted out all of this profanity specifically with the intention of making your blood boil, as there is only ever one way to skin a cat.

    Carlton..

  • spawluk (4/14/2009)


    I agree with Brad. This article was condescending.

    I am sorry that you feel that way.

    ... saying that developers that use them are lazy...

    One correction here, I never said that developers were lazy.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Andy DBA (4/14/2009)


    Also wondering if ownership chaining is still an issue with dynamic sql. Pretty sure I can write an sp that wraps sp_dont_touch_this(@someargument varchar(25)) in a cursor loop and grant execute on it to someuser without having to grant execute on sp_dont_touch_this to someuser. Is the same still not true for dynamic SQL in SQL 2005?

    Not sure what you are driving at here or what the concern is. Do you have a runnable example?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • This type of discussion reminds me of heated arguments I have had with Mr Celko about SQL standards. Yes, we should strive to use set based SQL syntax (and SQL standards). However, what we as developers are building are actual solutions not theoretical ones. On occasion, there are circumstances where the existing set-based implementations are simply insufficient to solve the problem and other solutions must be sought.

    A simple example of where set based operations break down is in very large data sets. I want to take a table of 10 million rows and copy it to another table in a production system. The obvious solution would seem to be something like:

    Insert DestTable

    Select ...

    From TableA

    Yet, at the number of rows we are discussing, this could take hours to execute even on the best hardware especially if the source table and the destination table are separated by a slower link. First we have to account for locks which are implementation based. Then we need to think about chunks. Set-based operations do not natively provide for chunking and thus we have cursors. Of course there are other ways around this problem, but at the end of day the only reason we are forced to consider alternate solutions is because the Hoyle set-based solution is insufficient. Even using SSIS (which is frankly a heinous beast) is simply having another procedural language loop through chunks of set-based statements.

    Another common use of cursors is to script a DDL change. For example, I want to write a script that will alter all columns in all tables call "Foo" to be nvarchar(10) instead of nvarchar(5). That is simply not possible using a set based solution.

    Cursors are most definitely something to avoid if possible. However, there are times when a cursor is in fact the best solution available.

  • Thomas (4/14/2009)


    Set-based operations do not natively provide for chunking and thus we have cursors. Of course there are other ways around this problem, but at the end of day the only reason we are forced to consider alternate solutions is because the Hoyle set-based solution is insufficient. Even using SSIS (which is frankly a heinous beast) is simply having another procedural language loop through chunks of set-based statements.

    [font="Verdana"]Agreed (both on SSIS and on the necessity for "chunking", although I call this batching (same concept). I wouldn't use a cursor for that though -- I would use a combination of TOP and a loop.[/font]

    Thomas (4/14/2009)


    Another common use of cursors is to script a DDL change. For example, I want to write a script that will alter all columns in all tables call "Foo" to be nvarchar(10) instead of nvarchar(5). That is simply not possible using a set based solution.

    [font="Verdana"]Um... seriously? You commonly generate DDL scripts using cursors? I could see this as an infrequent thing, but not "common". And while a row-by-row approach might be necessary, that need not be cursor based. I think I will reserve judgement on this example until I've seen a more realistic (and common) case.[/font]

    Thomas (4/14/2009)


    Cursors are most definitely something to avoid if possible. However, there are times when a cursor is in fact the best solution available.

    [font="Verdana"]My issue with "avoid cursors if at all possible" is the same issue I have with "use cursors without evaluating the alternatives". Sometimes cursors areally are the best solution, although I'm finding those times fewer and fewer. I will consider using a cursor when I am doing something that is inherently sequential (procedural) and where data volumes are small. In that scenario, a cursor (or equivalent row-by-row method) is often the best approach.

    I prefer "use the best tool for the job", and learning that traditionally row-by-row approaches can be replaced with set-based approaches is part of selecting the best tool.

    [/font]

Viewing 15 posts - 121 through 135 (of 380 total)

You must be logged in to reply to this topic. Login to reply