Do You Have Scary Code?

  • I don't think anyone "builds his business on millions of lines of code". What happens is that systems get written and then things change (the changes are frequently made by idiots). Where I work now we have reams of 'untouchable' code - mostly in COBOL or VB6. Over the years, some code has been deemed so difficult that when a new business condition is encountered, instead of amending what is there already, the coder has duplicated vast swathes with minor alterations and then called one or other set based on the new condition. Trying to create a set of tests on this mess implies that someone knows what it is supposed to be doing in the first place!

  • I was lucky in that in my first proper IT job (I had been doing mathematical modelling in Fortran about 25% of my time in my previous scientific role) that It was a totally new project and the director with responsibility insisted on full documentation from requirements analysis to testing. This included coding and commenting standards. Unfortunately this was cast aside when he was engineered out and high speed turn around became the name of the game. In my next two jobs I inherited really scary code. One person had used bizarre and meaningless variable names such as mouse for a small int and elephant for a large int, and tried to create code like = "apples = oranges" in billing software. There was also one product that had over 800 "gotos" in the code and it often went into a black hole! In my next job a huge amount of time was spent trying to move a product forward when it probably would have been more efficient to bite the bullet and restart anew. Well it has kept me occupied and reasonably well paid!

  • I had to rewrite an Assembler program back in the mid 90's into Cobol. No one wanted to touch that Assembler program and there were only a few people working here that even knew Assembler, and none of us really wanted to touch it either.

    I've seen some nesting looking SQL code, but I still think the Assembler program was the worst.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • We had this thingy at work to submit a claim. For a big claim, it might do 30,000-40,000 database calls. So they asks me, "Do you think you might be able to make this thing run just a wee bit faster?" It uses a homemade ORM from years ago, that is really just "sorta ORMy". I took out all the lazy loading properties in a commonly used class and instead used the proc that loaded the data to compute all the properties beforehand in a set based fashion and load them with the existing data. Oddly, it cut the 30-40,000 database calls down to 16. And for some reason I do not know and cannot possibly fathom, that cut the processing time for a claim from 5 minutes to 7 seconds. It just goes to show you, code is a mystery.

  • Jeff Moden (11/10/2015)

    It does, indeed. It sounds like it could become a full time job just keeping such tests up to date almost as if it were an application itself. Do they also serve to expedite QA and UAT (and I can't actually understand how they might be used by either, to be honest) or do they "just" make unit sanity checks easier/more consistent for Developers?

    Jeff,

    These days there are test frameworks for just about everything from server specifications through DBs, obviously code and also for frontend user interfaces.

    The tests are living artifacts that, when done properly, are a fine grained specification of what the system is supposed to do and the behaviours it is supposed to exhibit.

    The idea is that if you change something the test results tell you if the system continues to function as intended. This can extend to testing for mechanical aspects such as disk IO, CPU etc so you know early in the development cycle if you are going to overstress a system.

    The whole ethos is that a production fault is pursued to find out if the situation could be prevented by some form of test and if so to implement that test.

    The next step is to design stuff so it can be covered by an automated test.

    It's a discipline and I like discipline. It's why I enjoyed being a DBA.

    It is no more work than deploying manually tested code and then wasting days diagnosing problems in production. In fact in my experience it is quicker, cheaper and drives up quality

  • below86 (11/11/2015)


    I had to rewrite an Assembler program back in the mid 90's into Cobol. No one wanted to touch that Assembler program and there were only a few people working here that even knew Assembler, and none of us really wanted to touch it either.

    I've seen some nesting looking SQL code, but I still think the Assembler program was the worst.

    My only experience with Assembler was back in college. And it was not a good one. One of my jobs had a bug that caused the IBM 370 mainframe to shut down. The same one that ran the university. Needless to say that the administration was not pleased. Still, assembler is efficient.

    The more you are prepared, the less you need it.

  • Andrew..Peterson (11/11/2015)


    below86 (11/11/2015)


    I had to rewrite an Assembler program back in the mid 90's into Cobol. No one wanted to touch that Assembler program and there were only a few people working here that even knew Assembler, and none of us really wanted to touch it either.

    I've seen some nesting looking SQL code, but I still think the Assembler program was the worst.

    My only experience with Assembler was back in college. And it was not a good one. One of my jobs had a bug that caused the IBM 370 mainframe to shut down. The same one that ran the university. Needless to say that the administration was not pleased. Still, assembler is efficient.

    That one program was pretty nasty, the funny thing is that some of the other Assembler programs were really easy to follow and change. There were a few that were even easier than the COBOL programs we had.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • Jeff Moden (11/10/2015)


    GilaMonster (11/10/2015)


    I've recently seen a system with many user-defined functions, each with thousands of lines of code, often calling each other. Tests or no tests, I'm not touching that.

    They've done the same with views where I work. Views calling views calling views and they all call the same set of tables. The only saving grace is there are no aggregates that they can join on. Still, they're a joined nightmare that basically resolve to one of those huge single queries with multiple self-joins and the occasional many-to-many join that they've overcome the results of with DISTINCT.

    Oddly enough, though, they're not the worst problem that I'm currently or have been concerned with. To give you the idea of the type of code I've been repairing (you've been there, for sure), the code was written by a revolving door of about 15-20 different "developers" based on a database designed and implemented by those same people and they had about 4-5 years with no other guidance except "get 'er done". After 4 years, we're still cleaning up the mess.

    WOW, that's bad. I'm sorry for you.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • About six years ago i was brought into a large internet company to stabilize the SQL Servers they had running in production. Typical internet setup, 24/7, global activity, etc. They were all developed by java developers without any input from a database expert. No DBA, no data architect, no SQL developer. All the SQL code was generated by Hypernate, or the developers. One view they had joined 192 tables. (yes 192). I also found that Hypernate, at that time, went a bit stupid when it was working with more than about four tables. One giant mess that took time to clean up, but most of the time was on political issues trying to get the developers to change bad habits.

    The more you are prepared, the less you need it.

  • Jeff Moden (11/10/2015)


    GilaMonster (11/10/2015)


    I've recently seen a system with many user-defined functions, each with thousands of lines of code, often calling each other. Tests or no tests, I'm not touching that.

    They've done the same with views where I work. Views calling views calling views and they all call the same set of tables. The only saving grace is there are no aggregates that they can join on. Still, they're a joined nightmare that basically resolve to one of those huge single queries with multiple self-joins and the occasional many-to-many join that they've overcome the results of with DISTINCT.

    Oddly enough, though, they're not the worst problem that I'm currently or have been concerned with. To give you the idea of the type of code I've been repairing (you've been there, for sure), the code was written by a revolving door of about 15-20 different "developers" based on a database designed and implemented by those same people and they had about 4-5 years with no other guidance except "get 'er done". After 4 years, we're still cleaning up the mess.

    Using the DISTINCT or GROUP BY to get rid of the duplicates is a lot more tolerable to me than using the ROW_NUMBER() OVER logic to get rid of them. I'm seeing this ROW_NUMBER() OVER logic used in almost ever SQL the contractors we have working for us write. The bad thing is I'm also starting to see this logic creep into SQL written by the developers I work with.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 (11/12/2015)


    Jeff Moden (11/10/2015)


    GilaMonster (11/10/2015)


    I've recently seen a system with many user-defined functions, each with thousands of lines of code, often calling each other. Tests or no tests, I'm not touching that.

    They've done the same with views where I work. Views calling views calling views and they all call the same set of tables. The only saving grace is there are no aggregates that they can join on. Still, they're a joined nightmare that basically resolve to one of those huge single queries with multiple self-joins and the occasional many-to-many join that they've overcome the results of with DISTINCT.

    Oddly enough, though, they're not the worst problem that I'm currently or have been concerned with. To give you the idea of the type of code I've been repairing (you've been there, for sure), the code was written by a revolving door of about 15-20 different "developers" based on a database designed and implemented by those same people and they had about 4-5 years with no other guidance except "get 'er done". After 4 years, we're still cleaning up the mess.

    Using the DISTINCT or GROUP BY to get rid of the duplicates is a lot more tolerable to me than using the ROW_NUMBER() OVER logic to get rid of them. I'm seeing this ROW_NUMBER() OVER logic used in almost ever SQL the contractors we have working for us write. The bad thing is I'm also starting to see this logic creep into SQL written by the developers I work with.

    Hmm... the big difference between DISTINCT and ROW_NUMBER() is that distinct just discards rows that are exact duplicates. Using ROW_NUMBER() implies that there was at least some thought given to what would be ROW_NUMBER() = 1 and that there is some understanding that the original job was at some point for whatever reason creating duplicates. It also implies that the OVER in the ROW_NUMBER() matters more than whether there is differing data in other columns which DISTINCT will not do.

  • ZZartin (11/12/2015)


    below86 (11/12/2015)


    Jeff Moden (11/10/2015)


    GilaMonster (11/10/2015)


    I've recently seen a system with many user-defined functions, each with thousands of lines of code, often calling each other. Tests or no tests, I'm not touching that.

    They've done the same with views where I work. Views calling views calling views and they all call the same set of tables. The only saving grace is there are no aggregates that they can join on. Still, they're a joined nightmare that basically resolve to one of those huge single queries with multiple self-joins and the occasional many-to-many join that they've overcome the results of with DISTINCT.

    Oddly enough, though, they're not the worst problem that I'm currently or have been concerned with. To give you the idea of the type of code I've been repairing (you've been there, for sure), the code was written by a revolving door of about 15-20 different "developers" based on a database designed and implemented by those same people and they had about 4-5 years with no other guidance except "get 'er done". After 4 years, we're still cleaning up the mess.

    Using the DISTINCT or GROUP BY to get rid of the duplicates is a lot more tolerable to me than using the ROW_NUMBER() OVER logic to get rid of them. I'm seeing this ROW_NUMBER() OVER logic used in almost ever SQL the contractors we have working for us write. The bad thing is I'm also starting to see this logic creep into SQL written by the developers I work with.

    Hmm... the big difference between DISTINCT and ROW_NUMBER() is that distinct just discards rows that are exact duplicates. Using ROW_NUMBER() implies that there was at least some thought given to what would be ROW_NUMBER() = 1 and that there is some understanding that the original job was at some point for whatever reason creating duplicates. It also implies that the OVER in the ROW_NUMBER() matters more than whether there is differing data in other columns which DISTINCT will not do.

    IMHO having duplicate data in a table or report is usually not acceptable. If I have 10 columns and all of the data is the same for a couple rows and a group by will get rid of those I'm happy. But if I have this same situation and I'm doing a ROW_NUMBER() OVER and I say there are 5 'key' fields to put into this and then I only take the one where row = 1 then I don't know what data may have been lost that was or could have been different in those other 5 columns. Just my 2 cents.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 (11/12/2015)


    ZZartin (11/12/2015)


    below86 (11/12/2015)


    Jeff Moden (11/10/2015)


    GilaMonster (11/10/2015)


    I've recently seen a system with many user-defined functions, each with thousands of lines of code, often calling each other. Tests or no tests, I'm not touching that.

    They've done the same with views where I work. Views calling views calling views and they all call the same set of tables. The only saving grace is there are no aggregates that they can join on. Still, they're a joined nightmare that basically resolve to one of those huge single queries with multiple self-joins and the occasional many-to-many join that they've overcome the results of with DISTINCT.

    Oddly enough, though, they're not the worst problem that I'm currently or have been concerned with. To give you the idea of the type of code I've been repairing (you've been there, for sure), the code was written by a revolving door of about 15-20 different "developers" based on a database designed and implemented by those same people and they had about 4-5 years with no other guidance except "get 'er done". After 4 years, we're still cleaning up the mess.

    Using the DISTINCT or GROUP BY to get rid of the duplicates is a lot more tolerable to me than using the ROW_NUMBER() OVER logic to get rid of them. I'm seeing this ROW_NUMBER() OVER logic used in almost ever SQL the contractors we have working for us write. The bad thing is I'm also starting to see this logic creep into SQL written by the developers I work with.

    Hmm... the big difference between DISTINCT and ROW_NUMBER() is that distinct just discards rows that are exact duplicates. Using ROW_NUMBER() implies that there was at least some thought given to what would be ROW_NUMBER() = 1 and that there is some understanding that the original job was at some point for whatever reason creating duplicates. It also implies that the OVER in the ROW_NUMBER() matters more than whether there is differing data in other columns which DISTINCT will not do.

    IMHO having duplicate data in a table or report is usually not acceptable. If I have 10 columns and all of the data is the same for a couple rows and a group by will get rid of those I'm happy. But if I have this same situation and I'm doing a ROW_NUMBER() OVER and I say there are 5 'key' fields to put into this and then I only take the one where row = 1 then I don't know what data may have been lost that was or could have been different in those other 5 columns. Just my 2 cents.

    I was just trying to say that using the ROW_NUMBER() method at least implies that you've put some understanding into the duplicates and organized them to some degree in the ORDER BY in the ROW_NUMBER. Using distinct doesn't allow for that.

  • David.Poole (11/11/2015)


    ...The tests are living artifacts that, when done properly, are a fine grained specification of what the system is supposed to do and the behaviours it is supposed to exhibit.

    The idea is that if you change something the test results tell you if the system continues to function as intended. This can extend to testing for mechanical aspects such as disk IO, CPU etc so you know early in the development cycle if you are going to overstress a system.

    The whole ethos is that a production fault is pursued to find out if the situation could be prevented by some form of test and if so to implement that test.

    The next step is to design stuff so it can be covered by an automated test.

    It's a discipline and I like discipline. It's why I enjoyed being a DBA.

    It is no more work than deploying manually tested code and then wasting days diagnosing problems in production. In fact in my experience it is quicker, cheaper and drives up quality

    Yes. This!!!

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • call.copse (11/11/2015)


    cphite (11/10/2015)


    ...

    It was comprised of a join between two other views - their names don't matter - what matters is that each of those views was comprised of other views; as were the next views, and the next views, and the ones after that. All in all I think I counted 27 total views. I mapped the thing out and it took an entire white board.

    ...

    Sounds like something I'm looking after. I can barely explain how much I dislike working on this system, it's virtually impossible to rewrite the views without a chunk of time I cannot take, as there are multiple procedures dependent on them. You start trying to work out what is happening but my poor brain cannot keep the whole thing in RAM, so by the time I get to the bottom of the view stack I've forgotten what was happening. Also because I am not always clear on the requirements (they're not written down and there are no comments) I have to go by trial and error on matching results and the aggregates, unions and distincts seem to have a lot of different, weird, consequences.

    It will be rewritten from scratch before too long fortunately.

    I have found that mapping these sorts of things out on paper helps enormously. Basically I just make a tree and keep going until I get to the base tables. At each view I put some notes about any aggregation or filtering that's taking place.

    And then in most cases I just write something new from scratch, using the tree as at most a reference.

    In the example I describe, it was 27 views that ultimately came from around eight tables. The final query that replaced the mess was actually pretty simple.

    Or, if the views you're looking at are more complex, consider taking it on in stages. You may not be able to rewrite the thing in one pass; but maybe you can combine a handful of views into one. And then later do another handful, repeating as you have time. Eventually, you have a couple of queries (instead of dozens) and it's a lot easier to wrap your head around the whole thing.

Viewing 15 posts - 46 through 60 (of 69 total)

You must be logged in to reply to this topic. Login to reply