SQL Server Connection Reset for Connection Pooling
Recently one of our Data Warehouse SQL server instances started throwing a bunch of alerts (severity level: 20), the alert error message looked like this:
The client was unable to reuse a session with SPID [SPID], which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing
In this post I will go through the initial steps to diagnosing this issue, as well as introduce the new (SQL Server 2008+) Connectivity Ring Buffer feature, which lives in the sys.dm_os_ring_buffers DMV, and captures any server-side initiated SQL Server connection closures.
On first inspection of the error message, there seems to be some sort of resource starvation or inability to utilize certain resources to complete a SQL operation (for example, the Network is giving out). Essentially the SQL Server instance is for some reason dropping connections.
As with most SQL Server errors, the best place to start looking is the SQL Server ERRORLOG, the default path for SQL Server’s ERRORLOG file is:
:\Program Files\Microsoft SQL Server\MSSQL.\MSSQL\Log\ERRORLOG
Although the log is not always very helpful, I did notice a large number of deadlocks encountered just before the error message appeared, which started to sound like there might be a large number of connections trying to access similar resources, which in turn is causing these deadlocks, this does give the impression that we are experiencing resource starvation on the server, since deadlocks are costly and tend to keep connections open for longer than necessary.
Another good place to start digging around is the sys.dm_os_ring_buffers DMV’s Connectivity Ring Buffer. This ring buffer type will give you information regarding any server-side initiated connection closure (including connection closed using the KILL [SPID] function). Obviously you can establish the same result using server-side traces.
A great description of how to use the new RING_BUFFER_CONNECTIVITY feature can be found on msdn, this will include description of each XML tag, as well as a few handy ways to query the records.
I have found this handy script written by Jonathan Kehayias on MSDN, which helps convert the XML into table result for quicker analysis:
(SELECT
record.value('(Record/@id)[1]', 'int') as id,
record.value('(Record/@type)[1]', 'varchar(50)') as type,
record.value('(Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(50)') as RecordType,
record.value('(Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(50)') as RecordSource,
record.value('(Record/ConnectivityTraceRecord/Spid)[1]', 'int') as Spid,
record.value('(Record/ConnectivityTraceRecord/SniConnectionId)[1]', 'uniqueidentifier') as SniConnectionId,
record.value('(Record/ConnectivityTraceRecord/SniProvider)[1]', 'int') as SniProvider,
record.value('(Record/ConnectivityTraceRecord/OSError)[1]', 'int') as OSError,
record.value('(Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') as SniConsumerError,
record.value('(Record/ConnectivityTraceRecord/State)[1]', 'int') as State,
record.value('(Record/ConnectivityTraceRecord/RemoteHost)[1]', 'varchar(50)') as RemoteHost,
record.value('(Record/ConnectivityTraceRecord/RemotePort)[1]', 'varchar(50)') as RemotePort,
record.value('(Record/ConnectivityTraceRecord/LocalHost)[1]', 'varchar(50)') as LocalHost,
record.value('(Record/ConnectivityTraceRecord/LocalPort)[1]', 'varchar(50)') as LocalPort,
record.value('(Record/ConnectivityTraceRecord/RecordTime)[1]', 'datetime') as RecordTime,
record.value('(Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]', 'bigint') as TotalLoginTimeInMilliseconds,
record.value('(Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]', 'bigint') as LoginTaskEnqueuedInMilliseconds,
record.value('(Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]', 'bigint') as NetworkWritesInMilliseconds,
record.value('(Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]', 'bigint') as NetworkReadsInMilliseconds,
record.value('(Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]', 'bigint') as SslProcessingInMilliseconds,
record.value('(Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]', 'bigint') as SspiProcessingInMilliseconds,
record.value('(Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]', 'bigint') as LoginTriggerAndResourceGovernorProcessingInMilliseconds,
record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') as TdsInputBufferError,
record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') as TdsOutputBufferError,
record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') as TdsInputBufferBytes,
record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]', 'int') as PhysicalConnectionIsKilled,
record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]', 'int') as DisconnectDueToReadError,
record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]', 'int') as NetworkErrorFoundInInputStream,
record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]', 'int') as ErrorFoundBeforeLogin,
record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]', 'int') as SessionIsKilled,
record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]', 'int') as NormalDisconnect
--record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalLogout)[1]', 'int') as NormalLogout
FROM
( SELECT CAST(record as xml) as record
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') as tab
)
SELECT c.RecordTime,m.[text],*
FROM connectivity_ring_buffer c
LEFT JOIN sys.messages m ON c.SniConsumerError = m.message_id AND m.language_id = 1033
ORDER BY c.RecordTime DESC
The ring buffer connectivity records should give you an idea of which connections are being closed (remote or local), the reason for their closure, and allows you to identify any trends in the connections being closed, as well as many other interesting traces.
The above should give you a good place to start investigating the issue, as for solutions, it really depends on what the underlying issue is, the problem with failure ID is 29 is that its just a generic error message to an underlying system problem (or even just a SQL Server bug), either way, there is no one fix.
That being said, it might be worth trying out the following:
- This might be a SQL Server bug, which Microsoft has resolved in their SQL Server Service Packs or Cumulative Updates.
- Inefficient queries could be over-utilizing the server resources, it might be worth experimenting with optimizing queries or placing a resource governance strategies. You can use DMVs to identify high resource utilization queries.
- If you have resource-intensive queries, a high degree of parallelism could be causing too many DOPs per query, which in turn would cause resource starvation, this MSDN post goes through a few theories as to why this could be happening.
There is also a closed ticket with Microsoft Connect that discusses the SQL Server failure ID 29 issue.
Anyway I hope this helped, or at least set you along the right path, goodluck!
SQL Server Connection Reset for Connection Pooling rated 4 out of 5 by 1 readers
The code in the XML conversion script must be “record.value” and not “record.VALUE”, as the function name is case sensitive. Otherwise excellent article.
Cheers! And thanks for pointing out the “record.value” issue,
Its a little quirk in the syntax highlighter plugin am using, which is applying some capitalization on its own, I’ve turned that off, doesn’t look as pretty, but atleast it isnt wrong
Thanks for sharing query , but it doesnt give records before sql restart so have to wait for re-occurence of issue
Thats true, in some cases it can be really difficult to diagnose the issue after a SQL restart since many stats reset, unless ofcourse you were profiling the session. As you suggested I would keep an eye and next time it happens you can start digging around in the right places.
I tried that script but the time stamp that it gives are wrong in my case. for example it is yet to clock 8 am EST on 26th NOV but I already see that timing in my results. any ideas?
Thanks
Chandan
Hey Chandan,
I think this is to do with the way the OS Ring Buffer records notifications (or traces).
You can try this procedure instead (using NotificationTime rather than RecordTime):
;WITH connectivity_ring_buffer as
(SELECT
cast(record as xml).value(‘(Record/@id)[1]’, ‘int’) as id,
cast(record as xml).value(‘(Record/@type)[1]’, ‘varchar(50)’) as type,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/RecordType)[1]’, ‘varchar(50)’) as RecordType,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/RecordSource)[1]’, ‘varchar(50)’) as RecordSource,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/Spid)[1]’, ‘int’) as Spid,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/SniConnectionId)[1]’, ‘uniqueidentifier’) as SniConnectionId,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/SniProvider)[1]’, ‘int’) as SniProvider,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/OSError)[1]’, ‘int’) as OSError,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/SniConsumerError)[1]’, ‘int’) as SniConsumerError,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/State)[1]’, ‘int’) as State,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/RemoteHost)[1]’, ‘varchar(50)’) as RemoteHost,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/RemotePort)[1]’, ‘varchar(50)’) as RemotePort,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LocalHost)[1]’, ‘varchar(50)’) as LocalHost,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LocalPort)[1]’, ‘varchar(50)’) as LocalPort,
dateadd (ms, rbf.[timestamp] – tme.ms_ticks, GETDATE()) as NotificationTime,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]’, ‘bigint’) as TotalLoginTimeInMilliseconds,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]’, ‘bigint’) as LoginTaskEnqueuedInMilliseconds,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]’, ‘bigint’) as NetworkWritesInMilliseconds,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]’, ‘bigint’) as NetworkReadsInMilliseconds,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]’, ‘bigint’) as SslProcessingInMilliseconds,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]’, ‘bigint’) as SspiProcessingInMilliseconds,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]’, ‘bigint’) as LoginTriggerAndResourceGovernorProcessingInMilliseconds,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]’, ‘int’) as TdsInputBufferError,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]’, ‘int’) as TdsOutputBufferError,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]’, ‘int’) as TdsInputBufferBytes,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]’, ‘int’) as PhysicalConnectionIsKilled,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]’, ‘int’) as DisconnectDueToReadError,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]’, ‘int’) as NetworkErrorFoundInInputStream,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]’, ‘int’) as ErrorFoundBeforeLogin,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]’, ‘int’) as SessionIsKilled,
cast(record as xml).value(‘(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]’, ‘int’) as NormalDisconnect
–record.value(‘(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalLogout)[1]’, ‘int’) as NormalLogout
FROM
sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
WHERE ring_buffer_type = ‘RING_BUFFER_CONNECTIVITY’
)
SELECT c.NotificationTime,m.,*
FROM connectivity_ring_buffer c
LEFT JOIN sys.messages m ON c.SniConsumerError = m.message_id AND m.language_id = 1033
ORDER BY c.NotificationTime DESC
Good article!