Improve performance of a query by milliseconds. Should I loop the query to extrapolate the stats ?

  • Hi,

    I am a query that takes 0.26 seconds to complete in average. The challenge is the process executes the query 40 000 times. The sum of all execution is more than 2 hours.

    SELECT SUM(T1.BALANCEPAYABLE),
            SUM(T1.TTV),SUM(T1.FARECASH),
            SUM(T1.FARECREDIT),
            SUM(T1.TAXCASH),
            SUM(T1.TAXCREDIT),
            SUM(T1.COMMISSIONAMOUNT),
            SUM(T1.TAXONCOMMISSIONAMOUNT),
            SUM(T1.PENALTYAMOUNT),
            SUM(T1.COMMISSIONVAT),
            SUM(T1.MARKETSUPPORT),
            SUM(T1.VENDAMOUNTCUR)
            FROM TABLE
            WHERE (((PARTITION=5637144576)
            AND (DATAAREAID=N'abcd'))
            AND (FILEREFRECID=@P3))

    Currently, I do have an index on PARTITION, DATAAREAID and FILEREFRECID. The execution is simple: Index Seek + Key Lookup. I was thinking to create a Covered Indexes by adding all the fields in the select statement as included columns to the index. There is no more key lookup in the execution plan, but it is very difficult to evaluate the performance gain.

    Considering that i can not run the process in the application, what is my best bet to evaluate the performance gain at a bigger scale ? I was thinking to perform a loop like this:

    SET NOCOUNT ON
    DECLARE @FileRefRecID bigint
    DECLARE FileRefRecID_Cursor CURSOR FOR
    SELECT DISTINCT TOP 1000 FILEREFRECID FROM TABLE WHERE FILEREFRECID <> 0
    OPEN FileRefRecID_Cursor;
    FETCH NEXT FROM FileRefRecID_Cursor INTO @FileRefRecID ;
    WHILE @@FETCH_STATUS = 0
     BEGIN
      SELECT SUM(T1.BALANCEPAYABLE),
            SUM(T1.TTV),SUM(T1.FARECASH),
            SUM(T1.FARECREDIT),
            SUM(T1.TAXCASH),
            SUM(T1.TAXCREDIT),
            SUM(T1.COMMISSIONAMOUNT),
            SUM(T1.TAXONCOMMISSIONAMOUNT),
            SUM(T1.PENALTYAMOUNT),
            SUM(T1.COMMISSIONVAT),
            SUM(T1.MARKETSUPPORT),
            SUM(T1.VENDAMOUNTCUR)
            FROM TABLE T1 WHERE (((PARTITION=5637144576) AND (DATAAREAID=N'abcd')) AND (FILEREFRECID=@FileRefRecID))
         FETCH NEXT FROM FileRefRecID_Cursor;
     END;
    CLOSE FileRefRecID_Cursor;
    DEALLOCATE FileRefRecID_Cursor;
    GO 

    Would that make sense in your opinion ?

    Thank you

  • Why are you executing it 40,000 times? Why not once with a GROUP BY clause?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Without the structure and the execution plan, I'm guessing here.

    Yes, if you're trying to squeeze more performance, a covering index may be the way to go.

    As far as evaluating if it works or not, Extended Events are your friend. You should be using them. You don't need a really elaborate set of tests to ensure that you're getting a performance improvement. Generally, just setting up the query and hitting GO 50 afterwards will allow you to capture the behavior with Extended Events and then aggregate the 50 executions. Compare before and after adding the covering index and you should have your answer.

    However, if you want to get fancy, here's a PowerShell script that lets you evaluate behavior across a large data set. Just change the T-SQL scripts as needed of course.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • How is the underlying table clustered?

    What the most common WHERE conditions used when you read this table?

    The best, most complete solution overall is to best cluster the table, if possible.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Gamleur84 - Wednesday, March 27, 2019 1:45 AM

    Hi,

    I am a query that takes 0.26 seconds to complete in average. The challenge is the process executes the query 40 000 times. The sum of all execution is more than 2 hours.

    <<snip>>

    Would that make sense in your opinion ?

    Thank you

    Exactly the same query? Run the results into a permanent table, query it 40 000 times.
    Slightly different query each time, depending upon the value of @P3? Put all values of P3 into a table and join to it.
    Anything else requires a more complete explanation.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Grant Fritchey - Wednesday, March 27, 2019 7:20 AM

    Without the structure and the execution plan, I'm guessing here.

    Yes, if you're trying to squeeze more performance, a covering index may be the way to go.

    As far as evaluating if it works or not, Extended Events are your friend. You should be using them. You don't need a really elaborate set of tests to ensure that you're getting a performance improvement. Generally, just setting up the query and hitting GO 50 afterwards will allow you to capture the behavior with Extended Events and then aggregate the 50 executions. Compare before and after adding the covering index and you should have your answer.

    However, if you want to get fancy, here's a PowerShell script that lets you evaluate behavior across a large data set. Just change the T-SQL scripts as needed of course.

    Hi,

    Your script is very helpful. I have added few things to help me comparing the performance and I would like to know if it does make sense to you.

    # Get the connection
    $SqlConnection = New-Object System.Data.SqlClient.SqlConnection
    $SqlConnection.ConnectionString = 'Server=servername;Database=databasename;User ID = username; Password = password;'

    # Retrieve test data
    $BillToCustomerCmd = New-Object System.Data.SqlClient.SqlCommand
    $BillToCustomerCmd.CommandText = "SELECT DISTINCT TOP 40000 FILEREFRECID FROM TABLE WHERE (((PARTITION=5637144576) AND (DATAAREAID='abcd')) AND (FILEREFRECID <> 0))"
    $BillToCustomerCmd.Connection = $SqlConnection
    $SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
    $SqlAdapter.SelectCommand = $BillToCustomerCmd
    $BillToCustomerList = New-Object System.Data.DataSet
    $SqlAdapter.Fill($BillToCustomerList)

    # Set up test query
    $SQLCmd = New-Object System.Data.SqlClient.SqlCommand
    $SQLCmd.Connection = $SqlConnection
    $SQLCmd.CommandText = "DECLARE @sqlquery NVARCHAR(MAX);
    SET @sqlquery
     = N'SELECT SUM(T1.BALANCEPAYABLE),
       SUM(T1.TTV),SUM(T1.FARECASH),
       SUM(T1.FARECREDIT),
       SUM(T1.TAXCASH),
       SUM(T1.TAXCREDIT),
       SUM(T1.COMMISSIONAMOUNT),
       SUM(T1.TAXONCOMMISSIONAMOUNT),
       SUM(T1.PENALTYAMOUNT),
       SUM(T1.COMMISSIONVAT),
       SUM(T1.MARKETSUPPORT),
       SUM(T1.VENDAMOUNTCUR)
       FROM TABLE T1 WHERE (((PARTITION=5637144576) AND (DATAAREAID=''abcd'')) AND (FILEREFRECID=@FileRefRecID))';

    DECLARE @parms NVARCHAR(MAX);
    SET @parms = '@FileRefRecID bigint';

    EXEC sys.sp_executesql @stmt = @sqlquery,
     @params = @parms,
     @FileRefRecID = @btc;"
    $SQLCmd.Parameters.Add("@btc",[System.Data.SqlDbType]"bigint") | Out-Null

    $startTime = (Get-Date)

    # Run the tests
    foreach($row in $BillToCustomerList.Tables[0])
    {
      $SqlConnection.Open()
      $SQLCmd.Parameters["@btc"].Value = $row[0]
     
      # Option 1 - ExecuteNonQuery () does not return data at all
      # $SQLCmd.ExecuteNonQuery() | Out-Null
     
      # Option 2 - ExecuteReader() returns an object that can iterate over the entire result set
      $result = $SQLCmd.ExecuteReader()
      $table = new-object "System.Data.DataTable"
      $table.Load($result)
      $sqlconnection.Close()
     
      # Output to screen
      # $row[0]
      # $table | Select Column1,Column2,Column3,Column5 | Format-Table -AutoSize

      # Output to file
      # $row[0] | Out-File -Append C:\temp\output.txt
      # $table | Select Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9 | Format-Table -AutoSize | Out-File -Append C:\temp\output.txt
    }
    $endTime = (Get-Date)
    'Duration: {0:mm} min {0:ss} sec' -f ($endTime-$startTime)

    The scripts tell me how long it takes to run the query 40000 times. Then, I can create an index or make any other change in the schema and run the script again. I also put output options to validate I was getting the correct data from the query.

    In regards to deeper analysis, I can use Extended Events if I need to know the CPU time, Logical Reads, etc...

    Would you agree ? Would it be a good basic test ?

    Thank

  • It'll do the job. It's not how I'd do it. I would capture the metrics through extended events. The reason for this is, depending on where you run it and how, part of what you're measuring there, because you're bring back the data, is your network. If you really want to measure that, fine. However, most of the time, when tuning queries, I'm tuning queries. We can tune the network completely independently from the queries. In fact, we should. So measuring network performance over and over doesn't help me with the core of what I'm doing, tuning a query. Capturing the performance with extended events ensures that you're only measuring the work time done by SQL Server, not any network latency.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Thanks.

    Great point. I understand what your are seeing about optimizing the query by measuring the work time done by SQL Server. ExecuteReader() will get the data over the network and it was intended by me.

    Anyway, the script took 49 seconds to complete compared to more than 2 hours (0.26 sec * 40 000 times) for the application. I know now that the query and execution plan is not the issue here.

    Thanks for you help.

  • Gamleur84 - Thursday, March 28, 2019 6:21 PM

    Thanks.

    Great point. I understand what your are seeing about optimizing the query by measuring the work time done by SQL Server. ExecuteReader() will get the data over the network and it was intended by me.

    Anyway, the script took 49 seconds to complete compared to more than 2 hours (0.26 sec * 40 000 times) for the application. I know now that the query and execution plan is not the issue here.

    Thanks for you help.

    Happy to pitch in.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply