Automatically killing long running queries?

  • So this is my second time entering this. I was interrupted in the middle and ran into the web app timeout issue where you lose your post. I'll try and be quicker this time.

    Long story short: Big database - millions of records (12 years's worth) times multiple values in subtables = potentially tens of millions of records returned if a query is malformed. Our app has a "Search" function that allows the users to essentially do ad-hoc searches. If they specify the parameters wrong they end up with a query that doesn't appear to ever end because it needs a couple of hours to run. It's a web app so they hit "Back" and try again, and again. Finally they give up and go home. Meanwhile on the SQL Server there are 2 or 3 queries that continue to hum away, blissfully unaware that the session that spawned them is long gone (IE to WebLogic/Java to SQL Server). Anyway, after running for a day or two these queries start to really suck up resources. I suspect they may finish and not having any place to return their results they end up spinning their wheels and really burning CPU and other resources. Everyone in the instance (and sometimes the server) is affected. Our SQL guys (I'm in development) have no tools other than EM, QA and OSQL so they poke around, find the offending PID(s) and kill them.

    This is no way to run a SQL shop. While we're trying to find a way to mitigate the problem from the app end, we can't stop web users from closing IE and going home without getting the results from their query. So, 2 questions:

    1. Is there any way to automatically kill a query if it runs too long or uses too much CPU. I used to be able to do that in the mainframe world. I know EM, QA and ADO enforce (modifiable) limits on query length. Can I implement one at the DB or Instance level?

    2. Tools. We need better tools. We need something that monitors SQL resource utilization, tells us if there's a problem coming, and helps locate and resolve the problem. I'm looking at Quest SQL Server Central and I'm pretty impressed (esp the SQL Tuner-Whoo hoo!), although I haven't seen the price tag yet. Anyone use Quest? Opinions, comments? Anyone use competing tools?

    I'm open to any suggestions (aside from taking away ad-hoc query-no can do). Any help is greatly appreciated.

    TIA

     

  • I'm not sure how to help you on the automatic kill of any long running procs (never forget that, depending on what a procedure is doing, it can take as long or longer to rollback as it did to run), but I can talk about Quest SQL Server Central.

    It's a great tool and I highly recommend it, but the first task that you're hoping to address, an automated response to a situation on the server, isn't going to be addresssed by Quest Central. The tool you'd be looking for is Quest Foglight.

    Quest Central will give you the query tuning tool, which is pretty cool, but not a magic bullet. It simply deals with structuring queries, but doesn't deal with architectural issues (do you have a good table design, can you denormalize for performance) or indexing suggestions (do you have the right indexes on the tables to eliminate scans, table or index). It also gives you Spotlight which will allow you to get real time data on what's occurring on the server and which processes are causing you immediate pain and, even further, what is it about those processes that is causing the pain, at a server level. From there you can begin to tune procedures to clean things up. We're still in the process of figuring out how best to implement Quest Central in our processes and keep finding different ways the tool can help, but, unfortunately, it won't give you that automated kill that you're looking for. Hopefully someone else will have the answer to that one.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • You could implement your own Auto-KILL. You could periodically check sysprocesses (or sp_who/sp_who2) to identify offending PIDs and then kills them off?

  • Solution of Hans should work, but beware that you don't kill processes like cbcc checkdb and backup / restores....

    Another option is to use the query governor setting (see BOL). I've never used it in production, but this is maybe what you're looking for. It's actually server wide setting!

    HTH

    JP

  • Another solution is to code your web application so that users cannot generate these huge queries.  How about checking the parameters that the user has chosen to search on and if they are going to return a huge number of rows, bounce them back and tell them to fine tune their search.  This would solve all of your problems.

  • my 2 ct

    you could also have your programmers use the commandtimeout property for a ado-command !

    Don't let them use commandtimeout = 0 !  (endless) but have it limmited to an affordable number.

     

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution πŸ˜€

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I'd second this one. Even with a three minute timeout we get some pretty hefty loads running on the machine. And the fact of the matter is, no one should be running queries that take longer than that on a transactional system during working hours. Move those kinds of queries to off times or a reporting server or warehouse or data mart.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Thanks all for the suggestions.

    I can't set a server-wide limit since there are multiple instances (big cluster) and it would affect other apps that might have valid long-running queries.

    Checking the parameters is easier said than done. There is a large number of fields from which they can select and they can combine them in any way. For example, it's OK to not have start and end dates if you limit by type and office, or if you have a specific account number. We tried to map out all the possible tests and the logic became unworkable.

    This isn't ASP, and even in ASP if you close your browser while the query is running it won't time out. That timeout only works in ASP because the connection is alive and ASP (actually ADO) can tell the server to kill the query. If you have a 1 minute timeout, submit a query and then close your browser there's nobody there to tell the query to stop if it goes longer than 1 minute. The stateless nature of the web, remember? Plus, the app is Java using JDBC. While it does still have a timeout on the connection if the connection goes away there's no program running to monitor whether or not the query completed. The only way you can ensure a query is killed when the program dies is in a thick-client app where you can control the connection in the event the program is closed.

    It's the stateless nature of web apps that's really killing us. The queries don't know their originating program is gone. There's nobody monitoring whether or not they've completed. We may end up having to write something to monitor queries against this database in this instance and kill them if they exceed certain parameters.

     

  • Two more cents from left field:

    1) A three-tier (or "N tier"?) technology can address this. You toss in a layer in the middle; the user interface (top tier, web browsers) talk to code objects in the middle tier, the middle tier manages communications with the database, and the bottom tier (us DBAs) does all the work . The middle tier is persistent and, done properly, it can manage timeouts, dropped connections from the front-end, and so forth. It is of course a significant layer of complexity, and if your application would not benefit from impelmenting complex business rules on a dedicated server (as opposed to having the database server do all the work) you probably won't benefit from this.

    2) Like "Granted" said, a standard piece of system architecture is to offload system-busting reporting queries onto another server. Use {log shipping, replication, DTS packages, in-house code, whatever} to load data on that second server, direct all your ad-hoc reads against that system, and that overhead will no longer impact your OLTP functionality. The caveat here is that your data is now in two places, and will rarely be exactly in synch. Can your business allow for queries against data that gets updated 1 or more times a day, but that is regularly "behind" the data in the OLTP system? Given the performance issues, can your business afford not to? (A few jobs ago, after a few days of end user-written cross-joins against some Big OLTP tables, mgmt was very interested in this idea...)

       Philip

     

  • 1. We would have loved to have implemented this as 3-tier, but had essentially a fixed budget and too much work for the money we had to include writing an application-server layer. You're right, it would have made a lot of things easier, but I had one developer, a fixed-bid contract (not my idea) and a hard deadline. You do what you have to.

    2. We're discussing a "reporting" database on a separate instance. The problem is the customer department doesn't pay much for the current DB since it resides in a corporate share SQL server farm. If we have to set up a standalone server with a single instance and only this database then their cost goes up astronomically. They'd rather have a 2 hour slowdown once a month than pay for a separate server.

    This is more of an issue for the "other" apps in the farm. The thing I have to juggle is the impact this app is having on the other apps and finding a way to mitigate that without spending a lot of money. Hey! Maybe I could hit up the other customers: "If you want to stop these outages kick in $1,000 each and we'll get these guys out of your hair..." I could start my own protection racket... 

     

  • - So, suppose you can intercept these runaways, what's going to be your criteria to kill a thread ?

    - As always, performance is of _no_ concern, utill the server's light goes out !

    On the other hand, according the what I'm told at a MS W2K3 demo, with the fulloption edition on a multi-proc big box, you can assing cpu capacity, ... to an instance of sqlserver at the server. $$ are involved anyhow !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution πŸ˜€

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I agree with JP. See BOL

    query governor cost limit Option

    Use the query governor cost limit option to specify an upper limit for the time in which a query can run. Query cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration.

    Yelena

    Regards,Yelena Varsha

  • Yelena Varshal, the query governor cost limit is checking the estimated cost of a query, and not the estimated duration (in seconds). So using QUery Govenor Cost limit will result in very sporadic results.

    Just verifying now with Query Govenor set to 180 (which should be three minutes if we trust the documentation about the Query Govenor) disallows me to run a query with cost of 193 which takes (disabling query govenor) repeatedly below 10 seconds to run...

    I would NOT rely on Query Govenor in production if the aim is to terminate long running queries...

    //Hanslindgren

  • Hanslindgren ,

    This is pretty good. Do you use any performance-enhancing third-party SQL Tools like QuickShift? Is it high-end multi-processor system with optimized IO?

    I find that costs calculated by Estimated Execution Plan should be multiplied by 3 to get the number in seconds for the low-end hardware and uniprocessor or dual processor systems. But Query Governor limit still could be based on Estimated plan. For example: Estimated Costs for processor for a particular query is 2.47, Query Governor Limit of 2 does not allow it to run, limit of 3 does allow it to run, real execution time 8 seconds.

    Do not forget that they are talking about Processor costs. IO costs are not included I assume

    Yelena

    Regards,Yelena Varsha

  • Yelena Varshal,

    no, I do not use any extra SQL tool. It's a generic HP NetServer LXr 8500  8 CPUs with 36 disks in fiber.

    What I am saying is that you can't rely on the query govenor to limit your queries in time. I tried another query with Query Plan Cost of 0.400 and it takes around 15 seconds to run. I believe that the query cost in SQL Server is very similar to the query cost in Oracle. If it has a high cost it probably will take longer to execute then a query with a low cost but the costs between two queries are not comparable.

    Execution plans show, either graphically (the norm) or textually the β€œcost” of the query in units. These units have no meaning other than to measure against other cost unit measurements from other queries, and have no relationship to time, disk space, processor speed or any other known unit of measure. Evan R. Pauley (I have thought of writing him to ask if he can supply me with references why he states that but so far I have not done that)

    Unfortunatly I have found little other evidence that support my theory... I only know that it is NOT possible (even though microsoft says so) to compare Query Cost with Estimated seconds. Unless the factor of comparison is some algorith more complex then I can come up with.

    I would like everyone that has any good points about what the Query Cost actually stands for to express it here (or let me know in some way).

    //Hanslindgren

    Edit: P.S Here is another newsgoup discussion (on a Microsoft.Com forum) that discards the idea that time and query cost are related

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply