Diagnose SQL Server CPU Resource Starvation Issues with Email Alerts
Diagnosing CPU performance issues with SQL Server can be a bit tricky, particularly if you have a system that exhibit a CPU spikes pattern throughout the day. You could run some SQL DMVs in order to identify holistically how much CPU time each query is consuming, but that does not give you a real-time way of tracking CPU resource starvation on SQL Server.
Recently I have been asked to devise a system on a few of our servers that are experiencing CPU resource bottleneck issues, the monitoring system needs to identify when SQL Server is suffering from CPU bottlenecks, and report the queries currently running on the server via email alerts, along with performance statistics that can help you diagnose which queries are consuming most of the resources, and why are they doing so.
Identifying CPU Resource Bottleneck (Starvation) on SQL Server
The first step is to identify when SQL Server is suffering from CPU bottlenecks, there are a few ways this can be achieved, for example there are some 3rd party CLR based stored procedure that help you access Performance Monitor Counters from within SQL Server, these can be used to detect CPU over-utilization on the box in general or by the SQL process specifically, and based on pre-set threshold alerts can be sent out. Although I’ve decided to keep everything natively within SQL Server, so I opted for querying the sys.dm_os_ring_buffers DMV.
The sys.dm_os_ring_buffers Dynamic Management View regularly samples and reports on different aspects of the SQL Server system, this includes CPU utilization, memory pressure and state, security errors and many other interesting traces. This information is presented in an easy to parse XML format and also available to query without adding too much pressure on SQL Server, which means a high sampling rate can be configured without any issues.
So in order to identify CPU over-utilization issues using the sys.dm_os_ring_buffers DMV, we need to filter the ring_buffer_type column to RING_BUFFER_SCHEDULER_MONITOR, and only return rows were the XML record contains a SystemHealth element, as shown below.
@SQLCPUThrehold_Percent INT = 75,
@LogHighCPUCount INT = 3
;WITH topCPU AS
(SELECT TOP (5)
CONVERT (VARCHAR(30), getdate(), 126) AS runtime,
record.value('(Record/@id)[1]', 'int') AS record_id,
record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS system_idle_cpu,
record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS sql_cpu_utilization
--into #tempCPU
FROM sys.dm_os_sys_info inf CROSS JOIN (
SELECT TIMESTAMP, CONVERT (xml, record) AS record
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR'
AND record LIKE '%<SystemHealth>%') AS t
ORDER BY record.value('(Record/@id)[1]', 'int') DESC
)
SELECT @CurrentHighCPUCount = COUNT(*) FROM topCPU WHERE sql_cpu_utilization >@SQLCPUThrehold_Percent
The script essentially allows you to configure the threshold of what percentage of CPU utilization the script would consider as “over-utilization”, and the number of samples from the sys.dm_os_ring_buffers that meets this threshold before an alert is triggered, allowing you to control the length of the CPU spike you consider to be abnormal for your system.
The script is a bastardized version of Denzil Ribeiro’s Troubleshooting High CPU, which also provides a clever system of tracking the source behind CPU bottlenecks.
Reporting Currently Running (active) Queries
Now after identifying when a CPU starvation issue is in progress, we need to grab a snapshot of what is currently executing on the system, this will help us in diagnosing exactly what queries are contributing towards the high CPU utilization the system is experiencing, and what can we do in order to improve those queries performance.
I have created a handy little script that essentially:
- Grabs the currently running queries on SQL Server: Using the sys.dm_exec_requests which returns the currently executing sessions and their sql_handle.
- Appends memory information and query execution statistics (when available): Through sys.dm_exec_query_memory_grants and sys.dm_exec_query_stats we are able to extract important information regarding the execution of each query, including historical performance figures and execution counts.
- Append the text of the executing query: This will help us identify the query. In the case of a Stored Procedure this will be the CREATE statement of that procedure.
- Appends the Query Execution Plan: This will help us to dive deep into why this query is causing high CPU utilization, and optimize accordingly.
AS
BEGIN
SELECT
GETUTCDATE() AS 'DateStamp',
-- REPLACE(REPLACE ( REPLACE ( sqltext.TEXT, '
--' , ' ' ), ' ' , ' ' ),'|',' ') as [TEXT],
sqltext.TEXT
query_plan,
req.session_id,
req.status,
req.command,
req.cpu_time,
req.total_elapsed_time, requested_memory_kb,
granted_memory_kb,used_memory_kb, wait_order,
qs.Execution_count AS Executions,
qs.total_worker_time AS TotalCPU,
qs.total_physical_reads AS PhysicalReads,
qs.total_logical_reads AS LogicalReads,
qs.total_logical_writes AS LogicalWrites,
qs.total_elapsed_time AS Duration,
qs.total_worker_time/qs.execution_count AS [Avg CPU TIME]
FROM sys.dm_exec_requests req
INNER JOIN sys.dm_exec_query_memory_grants MG ON req.session_id = MG.session_id
LEFT JOIN sys.dm_exec_query_stats qs ON req.sql_handle = qs.sql_handle
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS sqltext
CROSS APPLY sys.dm_exec_query_plan(MG.plan_handle)
END
This is a very handy query that I recommend always carrying in your SQL toolbox. I usually have a “Maintenance” database that contains a host of queries I collected over the years that helped me diagnosing one thing or another.
I have created an XML version of this query that returns an HTML table, handy for sticking in an email alerts that support HTML.
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[usf_snapshot_QueryMemoryCPU_XML]
(
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @RESULT NVARCHAR(MAX)
DECLARE @xmlHeaderdata XML
SELECT @xmlHeaderdata =
(
SELECT
(SELECT 'DateStamp' AS th FOR xml path(''), TYPE),
(SELECT 'TEXT' AS th FOR xml path(''), TYPE),
(SELECT 'session_id' AS th FOR xml path(''), TYPE),
(SELECT 'status' AS th FOR xml path(''), TYPE),
(SELECT 'command' AS th FOR xml path(''), TYPE),
(SELECT 'cpu_time' AS th FOR xml path(''), TYPE),
(SELECT 'total_elapsed_time' AS th FOR xml path(''), TYPE),
(SELECT 'requested_memory_kb' AS th FOR xml path(''), TYPE),
(SELECT 'granted_memory_kb' AS th FOR xml path(''), TYPE),
(SELECT 'used_memory_kb' AS th FOR xml path(''), TYPE),
(SELECT 'wait_order' AS th FOR xml path(''), TYPE),
(SELECT 'Execution_count' AS th FOR xml path(''), TYPE),
(SELECT 'total_worker_time' AS th FOR xml path(''), TYPE),
(SELECT 'total_physical_reads' AS th FOR xml path(''), TYPE),
(SELECT 'total_logical_reads' AS th FOR xml path(''), TYPE),
(SELECT 'total_logical_writes' AS th FOR xml path(''), TYPE),
(SELECT 'total_elapsed_time' AS th FOR xml path(''), TYPE),
(SELECT 'avg_cpu_time' AS th FOR xml path(''), TYPE)
FOR xml path('tr'), TYPE
)
DECLARE @xmldata XML
SELECT @xmldata =
(
SELECT
(SELECT GETUTCDATE() AS 'td' FOR xml path(''), TYPE),
(SELECT sqltext.TEXT AS 'td' FOR xml path(''), TYPE),
(SELECT req.session_id AS 'td' FOR xml path(''), TYPE),
(SELECT req.status AS 'td' FOR xml path(''), TYPE),
(SELECT req.command AS 'td' FOR xml path(''), TYPE),
(SELECT req.cpu_time AS 'td' FOR xml path(''), TYPE),
(SELECT req.total_elapsed_time AS 'td' FOR xml path(''), TYPE),
(SELECT requested_memory_kb AS 'td' FOR xml path(''), TYPE),
(SELECT granted_memory_kb AS 'td' FOR xml path(''), TYPE),
(SELECT used_memory_kb AS 'td' FOR xml path(''), TYPE),
(SELECT wait_order AS 'td' FOR xml path(''), TYPE),
(SELECT qs.Execution_count AS 'td' FOR xml path(''), TYPE),
(SELECT qs.total_worker_time AS 'td' FOR xml path(''), TYPE),
(SELECT qs.total_physical_reads AS 'td' FOR xml path(''), TYPE),
(SELECT qs.total_logical_reads AS 'td' FOR xml path(''), TYPE),
(SELECT qs.total_logical_writes AS 'td' FOR xml path(''), TYPE),
(SELECT qs.total_elapsed_time AS 'td' FOR xml path(''), TYPE),
(SELECT qs.total_worker_time/qs.execution_count AS 'td' FOR xml path(''), TYPE)
--(select query_plan as 'td' for xml path(''), type)
FROM sys.dm_exec_requests req
INNER JOIN sys.dm_exec_query_memory_grants MG ON req.session_id = MG.session_id
LEFT JOIN sys.dm_exec_query_stats qs ON req.sql_handle = qs.sql_handle
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS sqltext
CROSS APPLY sys.dm_exec_query_plan(MG.plan_handle)
WHERE req.session_id != @@spid
FOR xml path('tr'), TYPE
)
SELECT @RESULT = '<table>'+CONVERT(NVARCHAR(MAX),@XmlHeaderdata) + CONVERT(NVARCHAR(MAX),@xmldata) + '</table>'
RETURN @RESULT
END
Setting Up a SQL Server CPU Resource Starvation Diagnostic Email Alert
Now we just need to put everything together.
This is simple and all we have to do is make sure that the condition for the alert is met, as in the CPU is being over-utilized and is causing bottlenecks according to the parameters we have setup, and an email alert is triggered carrying the results of our query above.
@SQLCPUThrehold_Percent INT = 75
,@LogHighCPUCount INT = 3
AS
BEGIN
SET NOCOUNT ON;
DECLARE @CurrentHighCPUCount INT = 0,
@SQLCPUThrehold_Percent INT = 75,
@LogHighCPUCount INT = 3
;WITH topCPU AS
(SELECT TOP (5)
CONVERT (VARCHAR(30), getdate(), 126) AS runtime,
record.value('(Record/@id)[1]', 'int') AS record_id,
record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS system_idle_cpu,
record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS sql_cpu_utilization
--into #tempCPU
FROM sys.dm_os_sys_info inf CROSS JOIN (
SELECT TIMESTAMP, CONVERT (xml, record) AS record
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR'
AND record LIKE '%<SystemHealth>%') AS t
ORDER BY record.value('(Record/@id)[1]', 'int') DESC
)
SELECT @CurrentHighCPUCount = COUNT(*) FROM topCPU WHERE sql_cpu_utilization >@SQLCPUThrehold_Percent
IF @CurrentHighCPUCount >= @LogHighCPUCount
BEGIN
DECLARE @tableVar NVARCHAR(MAX)
SELECT @tableVar = dbo.usf_snapshot_QueryMemoryCPU_XML()
SET @tableVar = '<p>Attached is the currently running queries that could be contributing towards high CPU</p><br/>' + @tableVar
EXEC msdb..sp_send_dbmail
@recipients='Diagnostics@Company.com',
@body_format = 'HTML',
@subject='CPU Starvation on box',
@body= @tableVar
END
END
The script above assumes that SQL Server Database Mail has been setup and configured with a default profile, otherwise the parameters for the need to be sp_send_dbmail needs to be configured.
And thats it! now you can schedule this using SQL Server Job Agent as often as you would like, and any time your server is suffering from CPU bottlenecks and starvation issue you should get an email alert with enough information to action the problem immediately.
I hope this helps you in troubleshooting your SQL Server instance CPU utilization issues. If you use a different technique to diagnose your CPU starvation issues please drop me a comment.
Diagnose SQL Server CPU Resource Starvation Issues with Email Alerts rated 5 out of 5 by 1 readers
Any chance you have the code for 2008R2?
Hey M,
Which bit?