The CLR

  • I think the main reason we have not adopted CLR functions is because our application runs on SQL2000 and Oracle -- in addition to SQL2005.

    In the interest of keeping a common codebase, we tend to stick to SQL solutions wherever possible. We do maintain separate SQL Server and Oracle scripts but those are primarily just for the syntactic differences.

  • I can probably see using CLR for some specialized functions, but probably NOT for performing DML operations on tables. Just because there is new way of doing things, it doesn't always mean that it is better. I.e., the proper tool for the job...

    I saw similar issues and dicussions with Oracle's support of Java, then later CLR (on Windows), within the database. In a lot of cases, it was no winner. E.g., a set of built-in XML DOM parsing functions that was provided by Oracle was re-written in C from Java in order to improve performance. As for PL/SQL vs. Java, Oracle's own whitepapers stated that PL/SQL was the proper choice for DML operations and performance testing was needed when given the choice for other things. They also added the ability to compile PL/SQL thus optimizing those sophisticated routines like specialized string parsing.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Actually, I have seen one good example.

    In the (brilliant) book "Inside SQL Server 2005: TSQL Querying", Itzik Ben-Gan uses two versions of a function that simply manipulates (in a complex way), a string.

    The function tries to remove actual values from TSQL Requests in order to extract a "template" query.

    Applied to Profiler Traces, it allows to evaluate the really important quer(ies) from a big trace.

    The TSQL function works but with limitations and is quite slow.

    The CLR version is more sophisticated, much cleaner (uses regular expressions in C#) and much faster (many times).

    However, the key here is that it does not access data beyond the current row, which seems to be a limiting factor in a database environment...

    Eric

    PS: I highly recommend that book. I found it "fascinating"

  • So far I have used the CLR to solve one problem. That problem was caused by the xp_cmdshell being turned off for security reasons in SQL 2005. The CLR allowed me to get around this issue. My problem with the CLR is that it allows for the potential of unmanaged/lost code. When using TSQL the source code for your stored procedures, functions, views etc. are stored within SQL. When using the CLR you are attaching built code into SQL. The code for the CLR stored procedure or function is not stored within SQL, so should something change to where you need to alter the application, you have to find the application, alter it, build it and attach it again, where if it had been written in TSQL you know where to find it, and modify it. In my organization I have not forbidden the programmers from using the CLR, but I do force them to prove to me that they cannot create the same functionality with TSQL. So far we have one app, and I wrote that one. Just my 2 cents worth.

  • The class example is currently running in Access VBA. There is very little pruning in it.

    Basically we have a query to generate the bitmask for each class. There is link between classes and courses. A student can request a course or a specific class e.g. to get a specific teacher.

    There is an array of courses and a matching array of classes together with the class sizes and the class weights ordered alphabetically. The student requests are also ordered alphabetically.

    Pass 1 involves a (recursive) procedure to find all the working combinations for each student. We start with the first course then the first class for that course and then compare that with the next course and the first class for that course and step through all the other choices to find working combinations. We produce a table with student number and number of working combinations. If there are no working combinations then we know we have problems with the timetable. (This is usually sorted earlier)

    Pass 2 involves doing more or less the same thing but we start with those who have the fewest combinations and work through. This time we allocate students to classes using a temporary table. Each time we allocate a student to a class we keep a record of the class size in the class table. When we have a choice of classes we look at the difference between the number currently allocated and the preferred size and choose the class with the most "slack".

    We can then examine the final result and decide whether or not to commit. If we don't like an allocation we can tune it by adjusting the class preferred size.

    This is a typical NpN packing problem.

  • Thanks for starting up this topic Steve. I'm very interested in the pros and cons that others express on it.

    Let me put this one at the top: What happens to CLR functions when the server is upgraded to 64 bit?

    For the last year, I've told myself to avoid writing CLRs because I haven't seen the answer to that question.

    Other cons:

    Writing applications that will run on SQL 2000. Yes it will be around for some time to come. Limiting your target audience to SQL Server 2005 or later isn't justified most of the time.

    Depending on who the end user may be, a CLR function could be a poor choice when it comes to maintainability.

    When first introduced, I was working on a custom application that required some special parsing of ASCII files (fixed length, but differing lengths). The code to do it clearly should run at the server. A CLR function would have saved me at least a day of T-SQL development work. Alas the server was 2000, and upgrading prior to the go-live date was out of the question.

    I was pretty sure at that point that it wouldn't be long before I saw another opportunity to write some, but I honestly just haven't come across one.

    If the 64 bit question wasn't out there, I would still need a clear and compelling reason to opt for a CLR function as part of a new solution.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • I'm not an expert DBA by any means, I know enough to be dangerous (which I'm sure you'll say the same about developers working with you as well :D).

    In discussions here at the office about what our next generation of architecture is going to be, I've been against widespread use of CLR (yes, heresy coming from the developer).

    I think that moving forward, most of our flexibility can be accomplished with improvements to the schema (and we have little need for regex and other popular CLR solutions).

    The one place that I do staunchly support CLR is during data conversion. I spend up to 25% of my time a year converting data from legacy systems to our system for new clients. I cannot tell you how many hours and headaches I've saved using CLR.

    Our best database personnel spend an average of 24-72 hours converting things like site address information for any given new client (depending on size).

    This last time around, I wrote some CLR sp's to handle breaking a single address field out to the USPS standard. It took me approximately 1 hour to write and 1 hour to test, and I hit 100% conversion.

    There were only 45,000 addresses in this conversion so I didn't have the range of bad data larger conversions may have, but I still cut the amount of time required by a day or two at the least.

    And the best part is that I've started a code base that can be used by those who aren't that adept at SQL, thus saving us more time.

    So, in short, the best use-case I can offer up is nasty data conversion. It's a definite time-saver.

  • Grasshoper,

    Why did you need to use CLR integration?

    Could not you have written a "simple" C# console application?

    That can also run on the database server and could be more modular...

    Or use SSIS...

    Eric

  • I've tested with it rather extensively, and I think it's got a way to go before it becomes really useful. Although the resource management has been getting better, it still rears its ugly little head often enough that I'm afraid to "lean" overly heavily on the CLR cane. Its ability to jam up its memory area (and seemingly no way to get it to dump it nicely without restarting the server) now makes me steer clear if it's something hard-core.

    Some of the functions can be useful to tap into things that just plain aren't in T-SQL (the Regex functions like Jeff mentioned, some "better" encryption routines or ones that aren't tied in to SQL Server, extensive LDAP access), but the functions tend to suffer from various ugly issues:

    - its strongest suit IMO is in the string manipulation area (over T-SQL), and I've managed to get a fair amount of mileage out of CLR on those. That being said - it's inexplicably not able to handle outputting Varchar(MAX), being stuck instead @ varchar(8000), so you can't play with the full specturm should you need to.

    - the Table-valued functions need to materialize their entire recordset before being able to return anything, so they can quickly "swamp" the memory area.

    - Table-valued functions' performance tends to suffer a great deal when they are to be used as part of CROSS or OUTER APPLY. Again - it rarely seems to be an even trade-off, and the derived table or even CSQ method will almost always outperform the CROSS APPLY User-defined function (all of them, not just CLR).

    - The scalar functions tend to perform very very well, and will often outperform T-SQL UDF's by a fair margin. Their usefulness is limited however by the same limits EVERY function seems to incur in T-SQL, where it is almost invariably faster to perform the calculations directly instead of calling the function.

    CLR Stored Procs tend to do a little better, but even then - their performance disadvantage on heavy-duty data access limits the areas where they have a shot to outperform T-SQL. Dealing with "ugly" incoming data, strange parsing, etc..., but then you start running into SSIS territory, which will also outpace most home-grown CLR.

    Aggregate are pretty cool, and they do perform reasonably well. I just wish that they had a bit more options in the way of options to create "odd-ball" aggregates (which is kind of the whole point), just as the ability to pass SEVERAL columns as part of aggregation call (think weighted average, for example). By restricting us to the exact same structure as the "traditional" aggregates, we're now competing with the "built-in" aggregates, which quite honestly do a decent job of covering the spectrum of aggregates needed. The one that jumps out as useful (at first glance) is the one in the BOL example (a concat function), but it suffers from the 8000-character limit mentioned previously (so you run the risk of running into overflow-type errors).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I have not used the CLR as yet, and have not really seen a compelling reason to use it. I try to avoid situations where strings need to be parsed through design, so that is not that compelling. I agree with others who have said it brings another layer of complexity. I also think most cases where the CLR would be useful are situations where the problem can be solved just a easily in the business layer of the application. I lean toward using SQL Server to "serve" the data to the application and let the application manipulate it. The rules I enforce in the database are around data integrity and are done through constraints (PK, Unique Key, Check, Foreign Key). Let SQL Server do what it is best at.

  • As soon as we deployed SQL Server 2005, I jumped on the CLR option. We have a large number of records with free-text fields for clinical notes. My task was to make sense out of these notes for statistical purposes. CLR allowed me to take my context-sensitive, pseudo-natural language parser and move it to the SQL Server. This reduced the time and network resources to run the statistical queries and got the process off of my desktop! Now, my user coommunity could easily run their own reports and I could get on to other projects.

  • inexplicably not able to handle outputting Varchar(MAX), being stuck instead @ varchar(8000), so you can't play with the full specturm should you need to.

    That's a real bad limitation! Thanks for the heads-up. In our case it would be even worse as we support UNICODE (NVARCHAR) which then would be limited to 4,000 characters.

    Interesting limitation as strings are UNICODE in .NET C# with no limitations! Why the screwy limitation when imbedded into SQL Server?


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • [font="Tahoma"]There is a problem that the SQLCLR can address - solving the issue of performance versus scalability in a single application installed at sites ranging from a few users to many concurrent users. Forgive this being a bit longwinded, but in case some of the readers are not as advanced as most of you, I figured more detail would help clarity for them.

    When developing an application, ideally you want the application to have excellent performance and scalability (among other SDLC issues). The DB is just one part of that application. Unfortunately, some approaches that enhance performance with a few users can hurt performance with many users.

    For this example, let's say we have a lot of our SPs that have business rules in them, and the TSQL is doing more than INSERT, UPDATE, DELETE, and SELECT. Think loops, tally tables, temp tables, maybe even a dreaded cursor or two.

    For a few concurrent users, that approach offloads processing to the DB server, reducing the amount of network traffic and improving apparent performance from the user's perspective. As the number of users increases, the CPU and RAM load on the DB server increases significantly, which means either a lot more money for scaling up the DB server, or the application will appear (to the user) to run slower because the DB is taking longer to process. In addition, the occurrences of deadlocks increases significantly.

    One answer is to do the processing of code (that is not INSERT, UPDATE, DELETE, and SELECT) on the client. That increases network traffic, but in a day of gigabit ethernet as a commodity, that does work up to a point.

    Another answer is to use middleware servers and restrict the SQL to INSERT, UPDATE, DELETE, and SELECT and offload the processing to middleware DLLs on a middleware server and using MTS. For large sites that can afford middleware servers, that works fine. The middleware servers have their own high speed backbone connection to the DB servers, so network traffic is not an issue.

    For those sites inbetween, there is a solution in SQL 2000 using extended stored procedures where you wrote the code in your preferred language and compiled to DLLs. That took advantage of the fact that the processing (of code that is not INSERT, UPDATE, DELETE, and SELECT) is much more efficient in a compiled DLL and uses fewer resources. Properly designed and written DLLs can work for both SQL Server extended SPs and for middleware servers using MTS, allowing the application to scale a lot higher while maintaining performance.

    SQLCLR replaces that "extended stored procedure" functionality. The same code (that is not INSERT, UPDATE, DELETE, and SELECT) can now be written in VB.NET or C# DLLs and called from SQL Server 2005 as SQLCLR. In addition, if properly designed and written, the same DLLs can be used in middleware servers with MTS and/or remoting.

    This approach to overall application design allows the same application to be installed so that it is scalable from a few to a lot of concurrent users. SQLCLR fits in nicely for the portion of scalability between a few users and a lot of users. You can ensure you have the best combination of scalability and performance for the full range of concurrent users, and - a side benefit - the code for the application is easier to maintain with one codebase for that range of users, as well as each language (VB.NET or C# and TSQL) doing what it does best.[/font]

  • emamet (5/13/2008)


    Grasshoper,

    Why did you need to use CLR integration?

    Could not you have written a "simple" C# console application?

    That can also run on the database server and could be more modular...

    Or use SSIS...

    Eric

    I'm guessing you meant me 😉

    I could have written a C# application, however, I don't think it would have been as "simple" as you might think.

    I would have had to code all the data access logic, all the code to iterate over the rows, etc. etc. It was much easier to do something like this:

    /*pseudo-code*/

    INSERT INTO foobar

    (foo

    ,bar)

    SELECT usp_ConvertData(c.foo)

    ,usp_ConvertData(c.bar)

    FROM nasty_conversion_data c

    within the context of the rest of the conversion script. In this way re-running the conversion doesn't require me to stop running sql and switch to a console app and back again, etc etc. Makes the testing/re-run process much simpler/quicker.

    Does that answer your question?

  • Yes, I suppose it's a lot easier if the data is already in your database.

    I was assuming you were importing it, in which case SSIS could be an option too.

    Cheers

    🙂

Viewing 15 posts - 16 through 30 (of 57 total)

You must be logged in to reply to this topic. Login to reply