The Glue that Binds

  • jpratt (9/19/2008)


    I've always seen the distinct difference between front end/business app coding and database coding. They really are two separate disciplines, and any SQL against relational data can be hard to get your head around if you've been writing VB/C#-style app code.

    I wholeheartedly agree. I've worked as a developer in .NET, Access and SQL. Some of the people i've worked with have been amazing programmers in .NET that were absolutely incapable of effective design/coding in a SQL environment. I haven't worked with LINQ, so i my point may be moot, but there is a distinctly different mindset and methodology for App-level and db-level programming.

    In one instance, i had to clean up a routine written by a C# programmer i had not worked directly with. She had used a loop approach on a database-side routine; i was able to convert it to use the pivot function and cut down the runtime from about 3 hours to less than 20 seconds.

    Can any software layer be expected to translate from one distinct approach to another?

  • I am a developer, but I have also worked as a dba (eg. you're good at coding SQL - the database is yours) on a high traffic site (1.2 million views a day). In my view the way many developers write SQL is by concatenating strings together - a painful process fraught with problems because the only way to check the code is to run it. They do it this way because it's a pain to fire up another IDE (SSMS), write the code and test it separately. This isn't the case with those who are "forced" to use stored procedures, but usually they end up working directly with DataSet, DataTable, DataView and DataReader. Except for the last one these are all unwieldy and bloated and require a lot of work to get what you want because you have to coordinate working with all three to get the job done. DataReader is light and fast but it takes some effort every time you want to use one to setup the connection and command.

    IMHO, many of these issues are mitigated by a good DAL which can be generated (CodeSmith or custom scripts to regenerate the model after making changes to the database) abstracts these issues, but the nice thing about LINQ is you drag-n-drop your tables on to a design surface and you're off and running without any other setup. Also, almost anywhere you are working with a LINQ entity you can just grab related records on the fly, w/o hooking up to the database (Of course this is also it's biggest problem - see below).

    It was mentioned that the database (SQL 2005) is auto-parameterizing LINQ queries - this is not true. LINQ auto-parameterizes the queries before submitting them. So to queries will all end up in the procedure cache (SQL 2000 or later). I have read much of the code used to write LINQ (read: LINQ to SQL) and have been pretty impressed. In the past my biggest complaint about ORM type libraries like LINQ is they don't parameterize. On the afore mentioned high-traffic site not parameterizing queries killed the CPU on our DB. The only shortcoming here with LINQ is they didn't go quite far enough, because the don't properly parameterize strings and numeric (decimal etc) types which have a size and precision. When they parameterize the query they set the size of the parameter definition to the size of the value. So if your table has a VARCHAR column with a size of 255 you could conceptually (obviously not in practice) end up with at least 255 different queries in the procedure cache because the users won't be submitting values of all the same length. The same goes for numbers. The sad part is it would be a very easy thing to fix.

    Another benefit to auto-parameterizing is you get protection from SQL injection attacks. Other ORM libraries I've worked with don't do this and IMO are dangerous and I won't touch them. Some use stored procedures but then RBAR is about your only option for getting related data. If you're investigating an ORM (even LINQ) just fire up Profiler and look at what it's generating, then take a moment to show it to your developers to help them realize the impact.

    The worst part about LINQ is that the path of least resistance results in RBAR. Even if the developer properly writes a query to pull data as a batch, when another developer (or often the same one) goes back to modify the procedure instead of going through the needed effort to ensure the data is all pulled as a batch he will most likely just take the object and "walk" through the object model to get to related data. This is SO easy in LINQ and therein is the problem. With some effort and through the use of the DataLoadOptions it isn't difficult to load data properly in batches. But many developers just create a loop and then walk the object and this is how LINQ (and other ORM libraries) kill your database.

    IMHO, LINQ is great for small applications with few users. It is really RAD and works beautifully in the case, but unless your developers are disciplined it can quickly go south. I've seen a simple page with 10-15 records run over 250 queries - the good news is I was able to reduce it to about 25 queries. I believe it can be optimized to run well in larger applications, but it requires effort and discipline.

    In short, LINQ really does speed development time and is worth consideration, but you're gonna have to lay down some guidelines to avoid the traps it creates when developers don't consider the effect of what they are doing on the database because it's so easy.

    As an aside someone mentioned how much better RAD development was before .NET (which can only be referring to VB 6 - the last version released). In the context of this discussion I just have to laugh. I worked with VB for 3 years and writing data access code was a pain. ADO.NET is far superior to writing ADO in VB. There were so many options you had to remember as part of your connection and recordset objects. The whole process was very problematic. And using the wizards and data controls was worse. I'm sorry but once I started developing in .NET I never looked back.

  • Hi Y'all

    I'm a C# / T-SQL dev. I have about 25 years experience. In that time, I have written some REALLY HUGE systems. For example, I am currently on a .NET 1.1 app that has 400+ stored procs. 10,000 lines of DAL code and about 1000 ASP.NET pages. This app is for a state agency and the app is public facing.

    I can safely say that a great deal of code is dedicated to building objects and data access. I have been learning about LINQ to SQL, and I must say I am very, very impressed. It would have saved MAN MONTHS on this system if it were implemented in .NET 3.5 SP1 (minimum required for LINQ to SQL).

    It's true that the LINQ syntax is roughly T-SQL backwards. But not having to write all that data access code is a tremendous time saver. Also, I've seen a lot of sloppy database connection handling / pooling (or lack thereof) by junior staff. Having the .NET framework handle all that is a great thing.

    Can't speak to the performance issues because the only really huge systems I write / support are in .NET 1.1 or 2.0. I'll let you know when I've benched LINQ....

    Cheers, terry

  • The Topic is not Linq right? I thought it was "

    How much of your time is spent writing data access or manipulation code?"

    Most of what I do is this type of code, research into BI, Application Security and extensive work in enterprize architecture and web services as data points. I have had to spend considerable time getting to data, then migration of that data into the form or schema I need it. The business rules and UI are done by the developers who actually create the applications.

    LINQ looks good for many things but one has to be concerned with the security of an application that uses inline SQL generated using LINQ. I have been looking at it as a tool to write utilities with, but in the production arena looking to use LINQ to execute stored procedures only. My ideas about and use of LINQ may change over time, but right now it looks flashy, quick, and potentially dangerous.

    Then as others have stated before, if you look at a code generations for the data layer you have already solved the stored procedure issue and you can LINQ into that nicely.

    I have used most every data technique Microsoft has had from early C++ days to now looking at LINQ. Before I worked ADABASE, ISAM, DAM etc files on the mainframe. I have coded simple ODBC connections using dynamically generated DSN's and inline DTS package generation using C# classes years back.

    So I have spent tons of time writing data access or manipulation code. The last little tool that I wrote was a simple tool that reads two xml files into respective datasets and then allows the user the ability to review the data table by table while they are being compared. I had to do it since the schema of the xml payload was not the same, however the table structure was once the data was read into an online dataset. All data, all access, manipulation and presentation.

    Miles...

    Not all gray hairs are Dinosaurs!

  • When i first started with dot net 2 using DataSet, i found out that sometimes it is running out of our control.. the error message is something like "the data does not fulfill the constraint". Sometimes, it took me a few hours to just fix this kind of error. After a few days of monkeying around with DataSet, we have decided to write our own DAL generator that mimics nHibernate + some of the features implemented in DataSet & DataTable. The best part of our DAL generator is that it does not use Reflection to get better performance. Besides that, when retrieving the data, it uses DataReader instead of DataSet where DataReader is able to load data faster than DataSet. These leads to our program runs pretty fast. Once we heard that there is a new data access programming feature... LINQ... we were not buying this idea yet.

    We still wants to have something more controllable and enhanceable by our own team. My own feeling is TSQL should always be the winner if we are talking about data processing as compare to LINQ. Because TSQL does not need to load the data outside of the database engine/memory for processing. I believe LINQ gives benefits for those who requires to consolidate data from different data sources (some from MSSQL, some from Oracle and some from text file or xml file).

  • Having paid many, many developer-months to produce a code generation framework to avoid the repetitive CRUD work in a previous life (with .NET 1.0), I'm a big fan of software factories, but if I can get a similar productivity boost from some other paradigm (like LINQ), I'm game to try...

    Speaking of which, I'm starting my first non-tutorial attempt with LINQ for a mobile project and SQL CE 3.5sp1 next week, so I'll find out shortly.

    INSERT both, feet INTO river

  • developmentalmadness (9/19/2008)


    The only shortcoming here with LINQ is they didn't go quite far enough, because the don't properly parameterize strings and numeric (decimal etc) types which have a size and precision. When they parameterize the query they set the size of the parameter definition to the size of the value. So if your table has a VARCHAR column with a size of 255 you could conceptually (obviously not in practice) end up with at least 255 different queries in the procedure cache because the users won't be submitting values of all the same length. The same goes for numbers. The sad part is it would be a very easy thing to fix.

    Indeed. Please vote here to get this fixed:

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=363290

    --
    Adam Machanic
    whoisactive

  • Adam Machanic (9/19/2008)


    developmentalmadness (9/19/2008)


    The only shortcoming here with LINQ is they didn't go quite far enough, because the don't properly parameterize strings and numeric (decimal etc) types which have a size and precision. When they parameterize the query they set the size of the parameter definition to the size of the value. So if your table has a VARCHAR column with a size of 255 you could conceptually (obviously not in practice) end up with at least 255 different queries in the procedure cache because the users won't be submitting values of all the same length. The same goes for numbers. The sad part is it would be a very easy thing to fix.

    Indeed. Please vote here to get this fixed:

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=363290

    Easy to fix? Good! Please apply here: http://www.microsoft.com/careers/

    😎

    I'm pretty sure that if it was "easy" that they would've already fixed it. I supposed we could adopt whichever of the "easy" (but suboptimal) solutions that you've considered: persisting metadata about the each variable-length column in "hidden" and compiled .NET code (which is not guaranteed to remain in sync with the database) -or- roundtrip to the database every time we need to create a parameter to fetch the appropriate length -or- provide for yet-another-caching mechanism in a threadsafe local data store (to avoid duplication of data on a single node) so that we only have to roundtrip once per process lifetime (except that might be a problem on the web) -or- insert some other interesting and nefarious scheme.

    I'm in favor of the hard-coded approach, since if you change your database you need to change your code, too, but I can understand why the LINQ team didn't take that typical ORM/software factory approach since LINQ is neither... It's always a tradeoff.

    😉

  • Yes, EASY. And not at all suboptimal, either. Sorry, but I don't buy "if it were easy they already would have done it." I suspect the truth is more along the lines of "they didn't know it was a problem until customers discovered it." No dev shop is perfect, not even Microsoft (some would say "especially not"). Things fall through the cracks.

    By the way, regarding "persisting metadata about the each variable-length column in "hidden" and compiled .NET code (which is not guaranteed to remain in sync with the database)" -- this is already an issue with LINQ! Nothing would stop me from going in and renaming a column, or changing an INT to a DECIMAL, or any number of other things that would break the compiled code. LINQ gives us compile-time checking in the build environment, not the production environment.

    --
    Adam Machanic
    whoisactive

  • David Reed (9/19/2008)


    Adam Machanic (9/19/2008)


    developmentalmadness (9/19/2008)


    The only shortcoming here with LINQ is they didn't go quite far enough, because the don't properly parameterize strings and numeric (decimal etc) types which have a size and precision. When they parameterize the query they set the size of the parameter definition to the size of the value. So if your table has a VARCHAR column with a size of 255 you could conceptually (obviously not in practice) end up with at least 255 different queries in the procedure cache because the users won't be submitting values of all the same length. The same goes for numbers. The sad part is it would be a very easy thing to fix.

    Indeed. Please vote here to get this fixed:

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=363290

    Easy to fix? Good! Please apply here: http://www.microsoft.com/careers/

    😎

    I'm pretty sure that if it was "easy" that they would've already fixed it. I supposed we could adopt whichever of the "easy" (but suboptimal) solutions that you've considered: persisting metadata about the each variable-length column in "hidden" and compiled .NET code (which is not guaranteed to remain in sync with the database) -or- roundtrip to the database every time we need to create a parameter to fetch the appropriate length -or- provide for yet-another-caching mechanism in a threadsafe local data store (to avoid duplication of data on a single node) so that we only have to roundtrip once per process lifetime (except that might be a problem on the web) -or- insert some other interesting and nefarious scheme.

    I'm in favor of the hard-coded approach, since if you change your database you need to change your code, too, but I can understand why the LINQ team didn't take that typical ORM/software factory approach since LINQ is neither... It's always a tradeoff.

    😉

    Yes, easy. How do you think they are auto-parameterizing the values in the first place? Because they are using meta-data. It's already there, along with the size and precision they just aren't using those properties of the field. Crack open your .dbml file with notepad and look at the column definitions. Then look at your .dbml.cs file - the DbType attribute in the .dbml file is also an Attribute of the matching property. Then crack open .NET Reflector and lookup MetaDataMember, you'll see it there too "String DbType". MetaDataMember can be retrieved from the private MetaTable property of the DataContext.

    BTW, this data is retrieved from the database ONCE by SqlMetal when the dbml file is created along with the designer.cs file. So no database overhead there either.

    And about the data schema getting out of sync with the application, when was the last time you updated your production schema like that? In order for this to break we're talking about dropping/renaming columns or changing data types completely - which is exactly what would happen with your hard-coded approach as well. I'm not talking about adding anything - just using what's already available. So yes, easy and fixable.

  • Thanks, I tried reporting this last November when it was released, but I didn't know how to report an issue. I tried emailing the team lead at the time to ask why this was ignored, but the team changed leads and my question got lost in the shuffle.

    I've now voted, confirmed and noted on the comments this applies to numeric types as well.

  • I actually didn't mention numeric types in the feedback item on purpose, because there are some deeper issues there and the fix isn't quite as straightforward. My suggested fix for the NVARCHAR/VARBINARY issue is to just bind strings with a length of less than 4000 to NVARCHAR(4000) and those with a length of greater than 4000 to NVARCHAR(MAX).

    Binding to the actual metadata/column length might be even nicer, but my suggested fix is simpler, has all of the same benefits, and this same fix is really needed in other parts of ADO.NET in addition to LINQ. See ParameterCollection.AddWithValue for another example of this problem... And here is a Connect item created for that particular issue, which the person from Microsoft who reviewed totally misunderstood:

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=329832

    My suggested fix of defaulting for strings and binary won't work for decimal, because what do you bind to? You can't arbitrarily bind to (38,0) or (19,19), or any other "maximum" value; you need metadata. And with LINQ you have it, but I was thinking of the more generic problem which is relevant across the board in ADO.NET. I also think that these kinds of issues strings are quite a bit more common than with numerics. And there are a lot more potential string lengths than there are numeric precision/scale combinations.

    Alas, it's clear that the ADO.NET team just doesn't get it when it comes to these kinds of issues. So I found a PM from the SQL Server plan cache team to bring into the fray. He agrees that this is a significant issue and is going to try to apply some internal pressure. So I really hope that we will soon see a fix... In the meantime, no LINQ for me and hopefully not much for my customers. I've spent way too much time over the past few years battling with the plan cache and I would rather not continue.

    --
    Adam Machanic
    whoisactive

  • The thing that worries me with all RAD tools is that they produce a mechanically working thing very fast but there is difference between something that works and something that works well.

    Having read "Joel on Software" he says that most business programming only requires you to write code that is "good enough" rather than best of breed. I've toyed with the MS Enterprise library and liked the productivity boost it gave to my programming but didn't like the way performance degraded when serious load was thrown at it.

    Again, 99% of software never really comes under serious stress so perhaps this isn't an issue in most real world applications.

    What I really want to learn more about is asynchronous programming.

  • 🙂 We do lose lot of time is Data access. I haven't used LINQ. If it makes our job faster then definitely cheers to it. I will try to hook into it now.

  • In addition to the automatic parameterization issues mentioned earlier, I have seen cases where the SQL emitted by a LINQ expression contains a cartesian product (CROSS JOIN) where it's not necessary. Consider the implications of running something like that against your order history table.

    I'm also not sure if someone has mentioned the direct table access (i.e., security) issue yet, but it's there.

    In my opinion, LINQ to SQL is not yet ready for prime time, at least as far as enterprise apps go. You could probably get away with using it for a small "utility" app or something, though. Now, using LINQ to derive information from in-memory programming objects is pretty cool. A nice way to take a step toward declarative programming in the app.

    And one last thing -- it seems that time and time again, MS comes out with the "next great thing" for improving data access. I've seen it in just about every rev of their development environment going all the way back to VB for DOS. These features demo really well at dev conferences, but never seem to gain the wide adoption that's expected. It will be interesting to see if LINQ is different in this regard.

    TroyK

Viewing 15 posts - 16 through 30 (of 51 total)

You must be logged in to reply to this topic. Login to reply