Update query hang

  • Hi:

    I've been using an UPDATE query as part of a stored procedure for some time now, and I've never seen any problems with it. However, it's always been used with a relatively small set of data. I'm now using it with as large a data set as I have and I'm seeing behavior I don't understand.

    Background:

    I'm gathering characteristics of licenses of our products owned by our customers. Such characteristics include product, last version shipped, support expiration date, etc. For various reasons, but primarily due to replacing a hardware device for protecting the software, a single license can be historically represented by multiple serial numbers. However, only one serial number is valid at a given time. But there are attributes of non-current serial numbers that need to be associated with the current serial number. For instance, a support agreement purchased for the license represented by SN 1, but a few weeks later, SN 2 was sent as a replacement for a broken SN 1. So, I want to make certain that SN 1 support agreement is included with SN 2 characteristics. No single serial number should ever represent the same license.

    Given that, here is the SQL query:

    UPDATE A_AccountLicenses

    SET SupportExpireDate =

    CASE

    WHEN EXISTS (SELECT T2.MaintEnd

    FROM Prodkits AS T1 INNER JOIN Transact AS T2 ON T1.Product_ID = T2.Product_ID

    WHERE T2.Serial_No IN (SELECT T10.Serial_No

    FROM A_AccountLicenses AS T9 INNER JOIN A_AccountLicensesHistory AS T10 ON (T9.RequestID = T10.RequestID AND T9.RecordID = T10.AccountLicenseID)

    WHERE T9.Serial_No = T3.Serial_No) AND

    T1.Prodtype = 'Maintenance')

    THEN

    (SELECT MAX(T2.MaintEnd)

    FROM Prodkits AS T1 INNER JOIN Transact AS T2 ON T1.Product_ID = T2.Product_ID

    WHERE T2.Serial_No IN (SELECT T10.Serial_No

    FROM A_AccountLicenses AS T9 INNER JOIN A_AccountLicensesHistory AS T10 ON (T9.RequestID = T10.RequestID AND T9.RecordID = T10.AccountLicenseID)

    WHERE T9.Serial_No = T3.Serial_No) AND

    T1.Prodtype = 'Maintenance')

    ELSE '01/01/1991'

    END

    FROM A_AccountLicenses AS T3

    Prodkits - a table that defines specific product characteristics

    Transact - a table that contains line item details from an invoice, including SN and support dates.

    A_AccountLicenses - Temporary table of all active serial numbers (SNs)

    A_AccountLicensesHistory - Temporary table of all serial numbers related to active serial numbers

    The queries in the WHEN and THEN sections are essentially the same. In WHEN, all I'm looking to do is save time and identify (assign an expiration date of '01/01/1991') those licenses that have NEVER had a support agreement at all. For those that have had a least one support agreement, THEN is supposed to return the most recent expiration date (as VARCHAR) from the support agreements purchased for that license.

    The sub-query in both is there simply to insure that support agreements for ALL serial numbers that have ever represented a license are included in the search for support agreements.

    That all seems pretty straight-forward. I think. Please let me know if you disagree.

    But here's where it gets odd.

    Problem 1

    =========

    If I run this query, alone, in Query Analyzer, it goes off into never-never land. I've let it run more than two hours. This is true even if I restrict the UPDATE to use only the first record, or first few records:

    FROM A_AccountLicenses AS T3 WHERE T3.Serial_No IN (SELECT TOP x Serial_No FROM A_AccountLicenses)

    The only thing that I can find that changes behavior is to replace the THEN SELECT query with a text string, like 'yes' and run it on the first few records. I've tried making the THEN SELECT query part of a CAST and CONVERT function, but doing so made no difference. Only the hard-coded text string works.

    Results (NOTE: for sake of illustration, I converted the UPDATE query to a SELECT query):

    Serial_No Support

    ------------ ----------

    501012 yes

    501064 01/01/1991

    501016 yes

    501014 01/01/1991

    501061 yes

    501054 yes

    501032 01/01/1991

    Problem 2:

    ==========

    But if I run it on the first eight records, it again seems to go off into never-never land. Restricting the query to the serial number in the eighth position

    FROM A_AccountLicenses AS T3 WHERE T3.Serial_No = '501018'

    yields the correct answer:

    Serial_No Support

    ------------ ----------

    501018 01/01/1991

    So, the problem doesn't appear associated with the data for that SN. And I deleted the first eight records, and repeat the experiment using the first seven and then eight records, the exact same symptoms appear. Seven works, eight fails, the eighth serial number alone works.

    I'm sorry, I know this is a lot of information, but I'm not sure I can simplify it any further. And I've sure failed to determine where the problem is.

    Please, any help is greatly appreciated. I'm totally stumped.

    Thanks for your time and attention.

    Happy New Year!

    JK

  • Q1.When you say it goes off into never never land, do you mean that the process says it's running, there's no waits indicated, and query analyser shows the process is awaiting an answer?

    Q2. How large are the recordsets being selected on by the subqueries?

    Q3. Can you post the execution plan of the one which completes, and possibly reduce the resultsets to get an execution plan on the one which doesn't currently?

    Q4. What indexes are there on the fields being selected on by the sub-queries, Ie.. Clustered, non-clustered, what fields, etc...?

    In the query you post, there is a case statement checking for the existence of a selection from two tables joined dependent on finding a value in a subselect from two more tables with a where clause on a couple of fields (gasping for breath here) and on finding that this is true, grabs the maximum value from two tables joined dependent on finding a value in a subselect from two more tables with a where clause on a couple of fields (gasping for breath here) and if it's not true, then using a default value. In my opinion, a different approach should be taken, to reduce the level of nesting. If that's just not possible for some reason, the above information should allow some information to point you in a good direction in optimizing the query. If the tables involved, or the amount of updates is very large, I could see this query as getting exponentially larger with every row, as each subquery references the data multiple times for each decision, etc... and I would expect this structure of query to take a long time to execute.

    And on looking over it, I believe that if they don't already exist, covered indexes on the four tables T1, T2, T09, T10 should DEFINATELY be created, considering the joins and where's. I would bet that your query is currently using a clustered index, and retrieving a bookmark lookup for the value you want, where with a covered index, it eliminates the bookmarked lookup, and you never touch the actual data. Your not scanning ranges, but retrieving individual values, so the clustering will be offset easily by the covered index as well. It's a common scenario that's often overlooked. Again, just something to check.

    OK, one more thing. In the structure of the subqueries, your using the IN statement. On looking over the query, and seeing that your attempting verification of existence, and then delimination of a row for the update, I believe you could change this out for a join which would reduce the overhead of this query exponentially by removing a level of nesting and comparison.

    Edited by - scorpion_66 on 01/02/2003 5:50:30 PM

  • Hi:

    Thanks for taking the time to go into so much detail.

    Q1 Response:

    There is no indication of any kind from Query Analyzer aside from the Exec time continuing to increment. That's my basis for saying 'never never land'. Is there a more useful status indicator I should use?

    Q2 Response:

    The SN History subquery should return no more than half a dozen or so SNs (records) in the absolute worst case, and that should be so rare as to be no more than 2 licenses out of 18,000. More common, if it returns more than one SN, it will return two, maybe three records.

    Q3 Response:

    This will be new territory for me. I'm one hundred percent self-taught from books and occasional forays into this wonderful SQL Server Central resource. I've only been working with SQL maybe a year now and am always under the gun to simply get things done. I've barely been getting by, and I'm starting to see myself sink. I didn't start in database work, but I absolutely needed database resources to do my real work (tech support). Lo these many years later, I'm still the keeper of the flame.

    I know how to show the execution plan for a query in Query Analyzer, but how do I make that available to you in a useful format by way of this forum?

    Q4 Response:

    The subqueries work on tables that are temporary and for which I've defined no index. If I were to define an index, I'd simply define them on the serial number field, and probably on the two linking fields. I'm embarrassed to say that I haven't spent a great deal of time with indexes.

    I see you worked with Alpha 4. That's what I cut my database teeth on. That's also the last time I really spent any time on indexes.

    I'm sorry, I have some other tasks that absolutely have to be done. I'll look into the rest of your response in a little bit. I'm not sure what clustered index is vs. a covered index.

    I feel bad for having taken your time.

    John

  • You can get textual execution plans by using the showplan options.

    SET SHOWPLAN_ALL { ON | OFF }

    As far as my time, don't feel bad about it at all. I spend time in these forums to learn. If I can pass on something I know, I feel like I'm giving back for what I've leeched in knowledge. If I'm corrected, it's one less thing I'll get wrong next time, and again I win, in that my origional goal of learning has once again been accomplished. And sometimes, I just answer to get a conversation going to see if anyone drops any new ideas on how to go about things, or can point out issues with my solution that I can fix (hah, I win again.)

    In your particular case, I've had to deal with this type of transaction before in some of the projects I've worked in, and just recognized the format of the query. It's a rotten deal about the temp tables, but gives more weight to the join over the In clause.

    A covered index simply includes all fields neccessary for a particular query. They see the most benefit for things like look-up table query joins, and high selectivity columns which are selected by value rather than by a range or values. With the data in temp tables, I don't believe you would benefit from them enough to even look at. Let's say you had a table where f1 is an identity, and f2 is a textual description, and there are a lot of other fields containing other data about this row. Now in a query where you join on the identity and return the text description, I would expect it to use the clustered index on the identity field (most common scenario). In the execution plan you would see bookmarked lookups where SQL grabbed the neccessary identity value matched rows and then turned around and got the text in a seperate plan step. As an alternate scenario, you could create a non-clustered index on the identity and the description, I would expect the optimizer to immediatly start using it. It eliminates the bookmarked lookup, as the description your returning was returned with the identity from the index, and the data level access is never touched reducing IO by the amounts of the rest of the columns in the table. This is the covered index scenario. (All fields referenced are included in the index in the proper order or access by the query). Now obviously, there's a breakover point at which the clustered index will perform better, but I've shown 1000%+ increases in query performance by this technique, and small nested verification of existence queries like yours are generally prime candidates. Like anything, test thoughoughly before using as all enviroments are slightly different.

    Wow, I can't believe anyone else ever worked with Alpha 4. I had a specialized project out in the back woods where they already purchased the software, so got to design in it for a couple years. I wound up writing some VB 3.0 interfaces to it, and some customized vb apps, to get away from the terrible interface. I kept at it till version 5.0, and then SQL 4.2 with a sprinkling of DB2, An oldie called Filemaker 1.0 was about the same time. I was working for a bunch of city departments and sheriff's offices at the time as contract.

    Edited by - scorpion_66 on 01/02/2003 11:20:29 PM

  • I agree with Scorpion's 1st para, that is what I use this forum for. By supplying sql snippets you get other peoples ideas and can write better sql.

    As for your problem, I have not deleved deeply into it (as Scorpion has done a good job already), the only thing I can give is Indexes. I had problems a while ago with a database that ended up with million+ rows which I updated, collated, summarized and inserted data daily. When I first started the jobs ran well but as data grew the performance dropped and I suffered your never-never land syndrome (24+ hour runs). I tuned the database by re-evaluating primaries and adding extra indexes and bingo the job times dropped. Don't know if this helps in your case.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Thanks guys. I appreciate the help. I did some reading about query optimization (fund covering indexes there) last night, and found SHOWPLAN, so I have some work do, but I've got someplace to start.

    As for taking your time, I understand what you're saying. I just feel embarrassed sometimes that I'm so marginal on so many things sometimes. Indexes are pretty fundamental to databases and I can't remember the last time I gave them any thought at all. I've had the fundamental database structure has been in place for many years, beginning in A4. The only table additions in the last few years have been for minor supporting data, lookup information and the like.

    I was having a small meltdown. I just saw more time burning away. I just have to remember that I'm doing the best that I can and keep going. Amazing what a night's sleep can do for perspective.

    Thanks for your help. I hope I can get good enough to do for others as you are doing.

    As for A4, yes, that was a trip. That scripting language made some tasks trivial, but it wasn't a programming language. But it wasn't intended to be. It was intended to make things as simple as possible for people who knew nothing about databases other than the fact they needed one, and didn't want to know much more than that.

    Again, thanks.

  • Hi:

    I've finally took time to revisit this situation yesterday. With your help, I was able to work it out, but I do have one additional question (at the end of this message).

    Scorpion66, you were absolutely correct about reducing the level of nesting and the benefit of adding an index.

    To simplify things, I removed the case statement and simply updated the fields, ignoring a default value for the time being.

    Original query:

    UPDATE A_AccountLicenses

    SET SupportExpireDate =

    SELECT MAX(T2.MaintEnd)

    FROM Prodkits AS T1 INNER JOIN Transact AS T2 ON T1.Product_ID = T2.Product_ID

    WHERE T2.Serial_No IN (SELECT T10.Serial_No

    FROM A_AccountLicenses AS T9 INNER JOIN A_AccountLicensesHistory AS T10 ON (T9.RequestID = T10.RequestID AND T9.RecordID = T10.AccountLicenseID)

    WHERE T9.Serial_No = T3.Serial_No) AND

    T1.Prodtype = 'Maintenance')

    FROM A_AccountLicenses AS T3

    This query completes in about 13.5 minutes

    New query:

    UPDATE A_AccountLicenses

    SET SupportExpireDate =

    (SELECT MAX(T3.MaintEnd)

    FROM A_AccountLicenses AS T1 INNER JOIN A_AccountLicensesHistory AS T2 ON (T1.RequestID = T2.RequestID AND T1.RecordID = T2.AccountLicenseID)

    INNER JOIN Transact AS T3 ON T2.Serial_No = T3.Serial_No

    INNER JOIN Prodkits AS T4 ON T3.Product_ID = T4.Product_ID

    WHERE T4.ProdType = 'Maintenance' AND T1.Serial_No = T0.Serial_No ANDNOT T3.MaintEnd IS NULL)

    The next thing I did was create an index using the join fields of RequestID AccountLicenseID in the table A_AccountLicensesHistory (T10)

    Doing so reduced the query run time to 8 seconds.

    So, thanks for pointing me in the right direction.

    However, I still have a question, but at least I have a specific question:

    I went back and tried to again use the CASE expression so I could provide a default value in case the specific license had never been under a support agreement. Since the test SQL statement is a less specific form of the update SQL statement, I thought this would perform reasonably well.

    I was wrong, in two ways. First, experimentally, I found that two straight update queries (first update with support agreements, second filling the remaining null fields with default values) will run MUCH faster than using CASE.

    Second, disregarding that fact, here is the SQL I used:

    UPDATE A_AccountLicenses

    SET SupportExpireDate =

    CASE

    WHEN EXISTS (SELECT T3.MaintEnd

    FROM A_AccountLicenses AS T1 INNER JOIN A_AccountLicensesHistory AS T2 ON (T1.RequestID = T2.RequestID AND T1.RecordID = T2.AccountLicenseID)

    INNER JOIN Transact AS T3 ON T2.Serial_No = T3.Serial_No

    INNER JOIN Prodkits AS T4 ON T3.Product_ID = T4.Product_ID

    WHERE T1.Serial_No = T0.Serial_No AND

    NOT T3.MaintEnd IS NULL AND

    T4.ProdType = 'Maintenance'

    )

    THEN (SELECT MAX(T3.MaintEnd)

    FROM A_AccountLicenses AS T1 INNER JOIN A_AccountLicensesHistory AS T2 ON (T1.RequestID = T2.RequestID AND T1.RecordID = T2.AccountLicenseID)

    INNER JOIN Transact AS T3 ON T2.Serial_No = T3.Serial_No

    INNER JOIN Prodkits AS T4 ON T3.Product_ID = T4.Product_ID

    WHERE T4.ProdType = 'Maintenance' AND

    T1.Serial_No = T0.Serial_No AND

    NOT T3.MaintEnd IS NULL)

    ELSE '01/01/1991'

    END

    FROM A_AccountLicenses AS T0

    This works to a point. Where it fails is that it never enters a default value. So, the 'exists' statement NEVER evaluates to FALSE.

    Since I'm trying to make this as fast a test as possible, it makes no sense to use the MAX function. But removing it and searching only for the existence of a non-null MAINTEND value for each serial number sends the query off into never never land. I don't understand that. If MAX(field) worked, seems like field should also work.

    Looking closely at the CASE expression, I finally realized that 'EXISTS' is probably illegal as it is used. 'EXISTS' is used in the WHERE clause of a query. I don't think it can be used as I was using it, even though there is no complaint from SQL about it.

    To test that, I changed the conditional query to count the number of non-null MAINTEND values for a given serial number and if greater than 0, get the max. If 0, then use the default:

    UPDATE A_AccountLicenses

    SET SupportExpireDate =

    CASE

    WHEN (SELECT COUNT(T3.MaintEnd)

    FROM A_AccountLicenses AS T1 INNER JOIN A_AccountLicensesHistory AS T2 ON (T1.RequestID = T2.RequestID AND T1.RecordID = T2.AccountLicenseID)

    INNER JOIN Transact AS T3 ON T2.Serial_No = T3.Serial_No

    INNER JOIN Prodkits AS T4 ON T3.Product_ID = T4.Product_ID

    WHERE T1.Serial_No = T0.Serial_No AND

    NOT T3.MaintEnd IS NULL AND

    T4.ProdType = 'Maintenance'

    ) > 0

    THEN (SELECT MAX(T3.MaintEnd)

    FROM A_AccountLicenses AS T1 INNER JOIN A_AccountLicensesHistory AS T2 ON (T1.RequestID = T2.RequestID AND T1.RecordID = T2.AccountLicenseID)

    INNER JOIN Transact AS T3 ON T2.Serial_No = T3.Serial_No

    INNER JOIN Prodkits AS T4 ON T3.Product_ID = T4.Product_ID

    WHERE T4.ProdType = 'Maintenance' AND

    T1.Serial_No = T0.Serial_No AND

    NOT T3.MaintEnd IS NULL)

    ELSE '01/01/1991'

    END

    FROM A_AccountLicenses AS T0

    This appears to do everything I want it to do. It updates with the default value and does so in a reasonable amount of time relative to just a straight update without a conditional. But it has to work harder than just a simple check for existence would have to work.

    So the question I have is:

    CASE

    WHEN EXISTS (sql query)

    THEN (sql query)

    ELSE (value)

    END

    Is this illegal, as I suspect, and there simply is no error check for it, therefore no error message?

    Or should it work, and this is a bug?

    Again, that you all for your help.

    JK

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply