Questioning the effects of 100% CPU usage

  • Today we had an incident with a server whose 2 CPUs got pegged at 100% usage. There is a SQL Server install on that server which we use mainly for MSDB (jobs / running SSIS packages). The unique thing about this situation is that it wasn't SQL Server pegging the CPUs. It was two services created by our Dev team.

    I discovered the issue when I tried to log on via SSMS and kept getting timeout errors. Our server guy kept telling me SQL was running and there were no problems with SQL Server despite me logging directly to the server via RDP and still having issues. When I discovered the CPU problem, he said that this couldn't be my problem because the usage had nothing to do with SQL Server.

    I'm fairly certain that connecting to SQL Instances when the CPU is pegged at 100% is difficult if not impossible, but I cannot find anything via Google for my specific situation. Most of the articles are talking about runaway SPIDS and other SQL processes causing the issue. I need to find documentation relevant to this situation that either tells me that I'm wrong (cause if I am, I need to apologize to someone) or that I can take back to our developers as a sort of "fix your code before this hits production" tool.

    Can someone help me out with this?

    I've attached the picture of my error.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Not quite sure exactly where you are coming from or what answer you expect. BUT surely you are right. SQL server requires CPU to do anything (else why bother with a CPU?), so if SQL Server can not get to the CPU because it is 100% used elsewhere then I would imagine it would time out.

    I have had a rogue process (not SQL, it was vb I think) grab 100% CPU on a dev SQL server (4 CPU) and it was darn near impossible to RDP or console to the Server and shut the process down, mind you that was before the Dedicated Administrator Connection. Oh an we knew something was astray as nothing other than the most simple of queries got thru and I suspect that was only because the 100% being grabbed by the rogue process was actually only 99.9% leaving a tiny amount for SQL.

  • ShineBoy (11/10/2014)


    Not quite sure exactly where you are coming from or what answer you expect. BUT surely you are right. SQL server requires CPU to do anything (else why bother with a CPU?), so if SQL Server can not get to the CPU because it is 100% used elsewhere then I would imagine it would time out.

    I have had a rogue process (not SQL, it was vb I think) grab 100% CPU on a dev SQL server (4 CPU) and it was darn near impossible to RDP or console to the Server and shut the process down, mind you that was before the Dedicated Administrator Connection. Oh an we knew something was astray as nothing other than the most simple of queries got thru and I suspect that was only because the 100% being grabbed by the rogue process was actually only 99.9% leaving a tiny amount for SQL.

    I'd think that the task scheduler would allow for new process creation even if the other processes used 100 percent of the (user) cpu (as the OS isn't going to let a user process squeeze it out, because a user process using 100 percent of the cpu leaving the OS starved works for nobody).

    The fact that an RDP session can be created in the face of 100 percent pegged cpus tends to bear this out.

    The original poster implies that this is on a dev machine, so I wonder why a simple test like stopping the service process(es) long enough to try a SSMS login is out of the question, but additionally, I'm pretty curious why a service needs exactly that 100 percent of cpu resources. Another think could be to check on how much RAM is available also, because a dev who needs to peg cpus with a service might not be all that skilled with memory management either. Plenty of clueless folks out there!

  • Did you try connecting with the dedicated admin connection (DAC)? I've seen that work in similar circumstances. And yeah, I think your devs might be out to lunch on this one.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • To clear up a misconception... we did stop the processes on the Test machine and I was immediately able to connect, as was the DBA sitting next to me. Plus a whole lot of other things cleared up (the Event Viewer was suddenly a lot faster to navigate, etc.).

    What I'm trying to find is documentation to back up my point for the inhouse people who don't want to believe me (so far there's one person who doesn't think that this should have caused any issues).

    On the other hand, I'm kicking myself for not taking screenshots of the issue before we killed those processes. 🙁

    EDIT: Fixed typo

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (11/11/2014)


    To clear up a misconception... we did stop the processes on the Test machine and I was immediately able to connect, as was the DBA sitting next to me. Plus a whole lot of other things cleared up (the Event Viewer was suddenly a lot faster to navigate, etc.).

    Ok, that would be enough for me. Sounds like a frustrating situation!

  • patrickmcginnis59 10839 (11/11/2014)


    Brandie Tarvin (11/11/2014)


    To clear up a misconception... we did stop the processes on the Test machine and I was immediately able to connect, as was the DBA sitting next to me. Plus a whole lot of other things cleared up (the Event Viewer was suddenly a lot faster to navigate, etc.).

    Ok, that would be enough for me. Sounds like a frustrating situation!

    Oh, it is frustrating. There are three processes in question. One alone takes up 50% of one core. The other two make up the difference. The first one was added about a month back, the other two were just recently added. So it's a case of multiple instances of CPU hog adding up together to steal everything as opposed to a single process doing the dirty deed.

    I'm going to watch this situation. Since we're still in test, I'm going to let them fire up the processes again so I can gather my screenshots and data. I think we also need to lower the priority on these processes to keep them from taking over CPUs that SQL needs.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • The Devs seem to be asking you to prove that the CPU hogs are a problem to SQL Server. I would try to turn that round, and get the Devs to prove that the CPU hogs are not a problem to SQL Server.

    Both situations will not be easy to prove, but in general the new kid on the block should prove they are not a problem to the established players.

    Also, try to work out how important the new services and SQL Server is to the business. If SQL Server is more important than the new applications, then that definitely puts the onus on the Devs to prove the new service is not a problem.

    The fact that you cannot do standard administration tasks on SQL Server when CPU is 100% is showing the new services do have an impact, but your manager needs actual figures and a cost amount in order to take action.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • The argument appears to be won without the expected inkshed and tantrums. The Dev in question has fixes for the issue.

    YAY!

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply