The set up was a SQL Server 2005 Cluster - the service was not failing over it was simply restarting. The error log would read something like - The service responded to a stop command from the cluster manager. There was no record of any problem in the SQL error log.
The first time it occurred whilst I was working here – it had happened before then – it looked like this:
At 00:29 the cluster manager issued a Stop command to the SQL Service. This caused the SQL Service to Stop. It was followed by a Start command and the service came back up again. The server was down for around 5 minutes.
There was no clue as to why it had happened except in the Windows Event Viewer which had a lot of lost communication and 19019 errors. These event errors corresponded with SQL jobs failing with server timeout errors – lost communication with the server - on tasks such as update stats in my maintenance plans (even though these are run directly on the server).
It carried on happening as I thought it was network cards and CPU affinity:
http://support.microsoft.com/kb/174812
http://www.sqlservercentral.com/Forums/Topic399630-149-1.aspx
http://support.microsoft.com/kb/892100
After a lot of tinkering and Googling it was resolved by setting Priority Boost to 0 – Microsoft recommends this for all cluster installs. After changing the setting you need to restart the SQL service. It is now servral months since this all happened and I no longer get any of these errors.
Even if you are not encountering any problems you ought to set PB to 0 (if it isn't already) as these restarts can happen at any time.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment