Please explain lock escalation query problem

  • Hello,

    Could someone please explain lock escalation with respect to the problem I am about to describe. I know how I can fix this problem, but I don't understand what the problem really is underneath the hood.

    The Problem Overview:

    I have a production table that stores transactions that members make. The table has around 40,000,000 records in it (yes 40 million). I am having a performance problem when I run a particular select statement on the table and it happens to hit a certain data distribution (I will explain momentarily). The select query looks for a particular transaction and has a large where clause (constraint) with only one constraint condition hitting one of the indexes on the table (the index and column is the member number, i.e. there is an index on the member number). The rest of the clause just eliminates all other records by that member.

    Now 99.9% of the member numbers only have between 1 - 5 transactions/records in the table. And if I select a transaction like this, the query takes about 2 seconds to run. Perfectly acceptable result time. However, .1% of the member numbers might have thousands of records in the table, even tens of thousands. If i hit one of these, the results take about 20 - 30 seconds. Not acceptable.

    Problem Details:

    I can reproduce the problem on demand. And there are no other processes or queries running, or any other users. I have exclusive use of the server while I am performing this. So there is not a user/query contention issue (resources, well thats what I am asking about). I need to clear the cache to reproduce this as well so that the results are not being cached. But once cleared, I select a member number with many entries (all of which are eventually excluded by the where clause), extremely slow. one with few entries, very fast.

    Why do I think this is a lock escalation issue? Because I ran server trace and in the slow query there is a lock escalation notification message. In the fast query (actually same query, different target row), no lock escalation message. Furthermore, if I modify the query to use NoLock or TableLock. Then it is fast all the time.

    Now from what I have read, lock escalation might be a reason for poor server performance. i.e. other processes or queries maybe blocked while the query runs. This makes sense. After 5000 row locks, the server escalates the lock level to table lock (which in my mind should be near instantaneous). But like I said there is no other users/processes running. And it is this query that is performing poorly. So, what is happening that performance is so poor. In my mind, after the lock escalation, the query should finish as fast any other query.

    So I know I can fix this by either a) Specifying the NoLock or TableLock or, b) Creating a new index with all the constraint columns (I think, haven't tested this yet).

    But I don't understand why the lock is causing such a performance problem.

    Any insight would be very appreciated. Thank you,

    Colin.

  • Are you clearing the Procedure Cache as well as the buffer cache?

    I don't think your issue is being caused by lock escalation as much as it is being caused by poor plan re-use. Essentially, in the case of 99% of your members an index seek is the best path, but in the other 1% an index seek is not the best access path and re-using the plan for the members with a few rows is causing the query to slow down.

    Can you post the execution plan(s)?

  • Lock escalation is not your problem. If you are the only user, then lock escalation is certainly improving your query. Why pay the cost for lock management overhead when there are no other users trying to modify the table concurrently? Far better for you if you escalate to a table lock and stop bothering with lock acquire/release.

    It sounds like there are two possible access methods to satisfy your query:

    (1) full table scan

    (2) seek into the index, scan all matching rows, and bookmark lookups against the table to check the rest of your predicates

    Whenever you have a data distribution like this (nearly all rows have low cardinality, with few large exceptions), the histogram of table statistics are going to represent that all inputs are highly selective. So you are almost certainly getting the second access method from above, even if your plan is getting compiled with one of your high cardinality inputs.

    Check your query plan and see what it's doing, let us know. Even for 5000 rows to consider, this is only 0.01% of the table size, so the bookmark lookups should still be your best option. I'm surprised it would take 30 seconds, so maybe you are getting a table scan for some reason. If getting a table scan, experiment with an index hint (http://msdn.microsoft.com/en-us/library/ms187373.aspx) and see if that makes a big improvement.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply