Slow Sql server

  • Hi All,

    We have a QA server on which a bunch of databases are hosted.

    It hosts like 20-25 databases. All these databases are not active all the team.

    At set of databases indicates each environment like QA1 , QA2 , QA3 etc... In each set , there is 1 big db which is around 4.15TB and others are 500GB each.

    Multiple testing team uses multiple environments. Sometimes only 1-2 environments are performing testing.

    During the time, the workload queries gets executed fine. But whenever, multiple teams starts testing 4-5 environments

    at same time , the performance is very slow. a query which takes 2-3 mins to finish would take more than 30 mins.

    During these times, testing complains a lot of query performance issues. We dont find any blocking and mostly see PAGEIOLatches on all databases.

    The server has 16 CPUs and 256GB RAM. Max server memory set to 80% i.e 205GB.

    Questions

    =========

    Q) This is kind of non-technical question. More of dealing with a pressure situation when servers are running slow.

    I usually check number of connections to the users which shows the activity in each database and share it with users.

    Sometimes we try bumping the max server memory and gets relief but still we see slowness in the execution of app queries.

    I typically use, sp_whoisactive to see current acitivity. for blocking and waits...

    All db are isolated to its own separate disks, autogrowths are set to 500mb fixed growth, tempdb has it own separate disks, instant file initialization is on, presize data and log files etc...

    Really sometimes I am not finding answers on how can explain it to the end users/testers that multiple environments are running and that's reason for slowness.

    They just keep sending escalation mails that db performance is slow. How can we tell / convey them in layman terms.

    Most of the business users are non-technical folks.

    What information do we need to gather so that we can show them or tell them so they understand?

    for instance, If we show the I/O warnings from errorlog , then they will says its something wrong with sql server.

    some people just look at PLE and say, there is no memory.

    Even if we provide the list of long running queries, high CPU queries no action taken. They say, it was running fine on development server but on this server it takes long long time.

    Lots of email noise goes on.. can't say in words. This continues especially when multiple releases happen in same month.

    I feel somewhere I am not able explain. How to deal in such situations. How to set the clear expectations and at the same time I feel somewhere I am missing a systematic approach.

    Please share your thoughts. Looking for some candid and detailed explanation.

    Q) For such server, how much memory is needed is actually needed?

    Lets take an example )

    If I have a 4TB database(db1) and another 1TB whole set of databases(db2,db3,db4) which makes 1 env.

    How does the calculation go for how much memory we need for the SQL VM?

    What factors to be considered in determining that this server must need this much minimum memory and this much disk throughput.

    Thanks,

    Sam

     

  • its always the same issue - dev does not have same volume as QA/Prod and what runs well in dev most times will not run well with volumes.

    Just a small example from my current project - one small proc (doing processing of incoming data and aggregating onto another) - code looks well, performed well in dev (volumes of less than 5 million rows being processed)

    in UAT - volume is now production size - its processing 3 Billion rows (3.000.000.000) - takes 37 hours to process. only option was for me to rewrite both code and table structure - and have this now down to 1 hour processing time. (16 vcpu vm/256 GB ram)

    so for those bad queries you identified - pick up some of them and rewrite them and/or add required indexes - and then get back to the developers with results of rewrite - and make it clear that it was how they were doing it that caused the issue - not the server/environment.

    just telling them to fix will not help in most cases (as you are seeing) - but giving them some simple straight examples of a fix will eventually push them on the right direction.

    regarding things to look at db level

    • Table Compression (Page or Row - I use Page all over normally)
    • Indexes
    • Statistics
    • IO speed (if this is a cloud server depending on disk tier performance can be quite bad - Azure P30/40 always gives me warnings about IO taking more than 15 seconds - P50 and higher required for acceptable/good performance)
    • MaxDop
    • CTFP (default is very bad)
    • Resource Governor (if you are on enterprise edition)
  • It all sounds like pretty traditional query tuning stuff.

    You're gathering some metrics, but it doesn't sound like necessarily the right ones. I'd suggest using Query Store or Extended Events to get some specific query performance information. Which queries are running slow and when. Which have the most load on the system. Then, get the text of those queries. Look for common code smells. Fix those. Then, open the execution plans for the queries that are problematic. Understand what's going wrong there. Fix that.

    It could all be code. It could be structures. It could be statistics. However, without more information, you're just guessing. Gather the information.

    Then, you can communicate better what's going on.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • With large tables, it's not likely that statistics are up to date (Frederico hinted at this in his post).  What are you doing to keep statistics up to date (especially if most of your indexes are based on ever-increasing values)?

    You also posted the following:

    Even if we provide the list of long running queries, high CPU queries no action taken. They say, it was running fine on development server but on this server it takes long long time.

    That attitude on the part of others is not unexpected and not bad.  What they're saying in an rather less abrupt fashion is... "prove it".  And that's exactly what you need to do (again, Frederico said this using other words).

    You say your using sp_WhoIsActive.  Have you been able to capture any of the execution plans that frequently returns and have you actually done some form of analysis to create a writeup that proves what the issues are?

    Also, you mentioned that you sometimes play with max memory and it sometimes helps.  Memory is comparatively inexpensive compared to your time and Developer's time.  If your databases are 4TB as I think you said, the updating the server to a half-terabyte or more is going to do nothing but good.

    The real issue is probably what it always boils down to and, again, Frederico hit the nail on the head using different words... it's going to boil down to identifying unhealthy code (a politically correct term for "crap code") and possibly even proving which parts of the code have issues and why.  The proper use of sp_WhoIsActive should help you a whole lot there.

    And, yeah... there is no easy analysis to be had here.  It's going to take quite a bit of work on your part because the Developers don't know how to do it and so they fall back on "it worked on my machine", meaning Dev and Staging.  If you can't prove which code and why it's having slowness and contention issues, you're going to be in the same pissing contest forever.  People that can't or won't figure it out for themselves (Developers just don't have the same knowledge in a lot of cases... not a fault... just a fact) aren't going to be able to help you.

    Grant has summarized it all quite nicely.  This is pretty typical query tuning stuff.  The hard part is always the first step... proving what the problem is and there aren't many shortcuts there.  No matter how many times you run sp_WhoIsActive, or SQL Profiler, or Extended Events, if you can't explain which code (frequently down to the statement) is causing the issues and explain "why" and, frequently, provide a demo of a fix, you folks are just going to continue the back and forth without either side actually knowing what the hell they're talking about.

    And be nice.  You're all working for the same company.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Although mostly about memory, the following article has some pretty good stuff about examining things when you have long running queries.

    https://www.erikdarlingdata.com/sql-server/does-my-sql-server-need-more-memory/

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Typically I/O causes the most issues.

    Therefore, another first quick thing to do is to look at the queries with the TOP (10) Avg I/O.  You can find these from SSMS.

    Right-click on the Instance name; scroll to "Reports", "Standard Reports", and finally "Performance - Top Queries by Avg I/O".  Naturally you can use the "+" to expand the query info for further details.  If you want to copy text from the report, you can Export to PDF and then open the PDF to be able to copy text.

    If you have some queries that run frequently, you might also want to look at the "Performance - Top Queries by Total I/O" report.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Thank you very much everyone. A lot of wise words.

     

  • Jeff Moden wrote:

    With large tables, it's not likely that statistics are up to date (Frederico hinted at this in his post).  What are you doing to keep statistics up to date (especially if most of your indexes are based on ever-increasing values)?

    You also posted the following:

    Even if we provide the list of long running queries, high CPU queries no action taken. They say, it was running fine on development server but on this server it takes long long time.

    That attitude on the part of others is not unexpected and not bad.  What they're saying in an rather less abrupt fashion is... "prove it".  And that's exactly what you need to do (again, Frederico said this using other words).

    You say your using sp_WhoIsActive.  Have you been able to capture any of the execution plans that frequently returns and have you actually done some form of analysis to create a writeup that proves what the issues are?

    Also, you mentioned that you sometimes play with max memory and it sometimes helps.  Memory is comparatively inexpensive compared to your time and Developer's time.  If your databases are 4TB as I think you said, the updating the server to a half-terabyte or more is going to do nothing but good.

    The real issue is probably what it always boils down to and, again, Frederico hit the nail on the head using different words... it's going to boil down to identifying unhealthy code (a politically correct term for "crap code") and possibly even proving which parts of the code have issues and why.  The proper use of sp_WhoIsActive should help you a whole lot there.

    And, yeah... there is no easy analysis to be had here.  It's going to take quite a bit of work on your part because the Developers don't know how to do it and so they fall back on "it worked on my machine", meaning Dev and Staging.  If you can't prove which code and why it's having slowness and contention issues, you're going to be in the same pissing contest forever.  People that can't or won't figure it out for themselves (Developers just don't have the same knowledge in a lot of cases... not a fault... just a fact) aren't going to be able to help you.

    Grant has summarized it all quite nicely.  This is pretty typical query tuning stuff.  The hard part is always the first step... proving what the problem is and there aren't many shortcuts there.  No matter how many times you run sp_WhoIsActive, or SQL Profiler, or Extended Events, if you can't explain which code (frequently down to the statement) is causing the issues and explain "why" and, frequently, provide a demo of a fix, you folks are just going to continue the back and forth without either side actually knowing what the hell they're talking about.

    And be nice.  You're all working for the same company.

    A question came into my mind,

    How can we prove when 2-3 databases are actively being used (out of 20 db's hosted on the server ) are fighting for memory and none are getting benefit of the available memory and almost all queries running slow.

    How to prove if at timeline-A db1 was to taking up most of the memory and after sometime (Timeline-B) the other database(s) queries are using up the memory and because of which db1 queries are suffering slow performance?

    If this kind of data is available/captured, probably we can explain or prove them in a better convincing way.

     

  • https://www.google.com/search?q=what+is+using+memory+in+SQL+Server

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Straight up: Extended Events or Query Store. Which set of queries are using the most resources? Answer the question.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply