Measuring Snapshot Isolation update conflict resolution time

  • An application is using snapshot isolation level and the update conflict process in transactions to ensure two requests don't update the same records. The business requests consists of some reads, some application tier logic and some writes in a single transaction where the update conflict resolution occurs. There are no locking hints used as these reduce throughput and increase average latency in our tests (e.g. UPDLOCK, ROWLOCK).

    TempDB file IO latency is starting to rise and I was wondering if there is a way to find out how long the update conflict resolution process takes and what impact tempDB IO latency may have?

  • I am unsure what your application's "update conflict process" does - are you referring to SQL Server's internal process? If the application's, how does its code handle a Msg 3960, other than to resubmit? If you are wondering about msg 3960's impact upon durations, I have not tried to directly measure its impact, but monitoring transaction events or views should help. Even so, it likely would be a fatal mistake to test snapshot isolation in a vacuum :).

    Because snapshot isolation duration concerns persisting (and redirections of other spids to reading tempdb's row versions), a flush or fetch of a row version from tempdb can require a physical IO (8KB) or more. Because tempdb concurrently satisfies the temporary object needs for all active spids, there is only one meaningful method to measure snapshot isolation's impact upon durations, which is to simulate concurrency (using realistically modeled data and realistic execution rates). Tools such a VS Test, LoadRunner, or RML can simulate concurrency - I am unsure about data modeling (another forum reader may be able to assist).

    Once concurrency and data can be simulated and replayed, measuring snapshot isolation's impact can be more meaningfully measured. But because the effects of concurrency can vary, results must be considered statistically. Durations need to be aggregated by test run or by test stage, and multiple additional test runs should be run, as a confirmation of the first test run's results.

    The same approach applies to all tempdb IO, in general. It is important to know the wait_resource being waited upon (there is more than one reason for an IO;-)), which means polling (statistical sampling) is "required", while tempdb is being concurrently used.

  • Thanks for the help. It is the SQL Server internal row version update conflict resolution process I am referring to. The applications retry when a conflict occurs but I am trying to measure the latency penalty specifically due to conflict resolution.

    I am measuring tempdb IO response with dmv and performance counters sampled every 5 seconds. Are you suggesting I check for a wait_type of pageiolatch on a resource description referring to tempdb? I am recording waits using an extended event on queries over 5 ms but it is hard to make the association of a specific wait with the update conflict resolution process as opposed to any other use of tempdb which these queries will also be making. I find the resource description for tempdb pages are too transient to resolve the object from the allocation unit reliably.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply