Front End Service Crashed on SBA (Stupid SQL Server!)

This article is really about SQL.

—————————–

We received a SCOM alert yesterday that the Lync Front End Service (RTCSRV.exe) stopped on one of our SBA’s. Sadly, this is a fairly routine occurrence as the combination of Windows + Lync + SCOM + SCCM (especially SCCM) can overwhelm the meager 4GB on our SBA’s.

So I RDP’ed to the SBA and tried to start the Front End service. It failed. Unusually, the Mediation service and Centralized Logging Service were also stopped.

So I did Step 1 in my troubleshooting routine and opened up Event Viewer and looked at the Lync logs. I saw a bunch of failed logins to the XDS database. “Got it”, I thought to myself. “The SQL Service must have stopped”. But when I returned to Services, the SQL Service was indeed running. Just for good measure I restarted it. After the restart, none of the services would start. So that didn’t fix it.

I dug deeper into the event logs and came across the following 2 entries in the Application log.

SQL1

Operating system error 1117(The request could not be performed because of an I/O device error.) on file “c:\Program Files\Microsoft SQL Server\MSSQL11.RTCLOCAL\MSSQL\DATA\xds.ldf” during SQLServerLogMgr::CheckLogBlockReadComplete.

SQL2

An error occurred during recovery, preventing the database ‘xds’ (5:0) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

That doesn’t sound good. It seems there might have been some corruption sneak in to the XDS database.

On an SBA, the XDS database holds the replicated copy of the CMS. If Lync can’t read this database on startup, it won’t know what its configuration is. So without this database, Lync won’t start.

The next step was to “look” at the database via SQL Server Management console. Except, due to firewalls, I couldn’t connect to SQL on the SBA from our main Lync pool SQL Servers. So I went about installing the SQL Server support tools on to the SBA.

Finding the download was a lot harder than it should be. But I eventually found it on this page. I downloaded the SQL Server 2012 Management Studio x64 version and got that installed.

I could now look at the status of the database. Can’t say I like what I saw.

SQL3

As I am not at all skilled with SQL, I brought in one of our SQL Admins to help with this.

We went through the following:

Alter Database [xds] set emergency
DBCC CHECKDB (xds);
GO

DBCC did not report any errors. So we tried:

Alter Database [xds] set online

That didn’t work.

We tried a bunch of things, such as taking the DB offline and then back online.
Based on this error, I ran a chkdsk scan to see if there were bad sectors on the drive or any other disk errors.

Chkdsk reported no issues.

chkdsk

We ran the following but it didn’t help either.

Alter Database [xds] set single_user with rollback immediate

We ran this command too but it didn’t do anything:

EXEC sp_resetstatus 'xds'

Finally we ran the below which fixed the problem. The risk is that the below could lose data. As this is database is just a copy of the CMS, we went ahead with it. That and we didn’t have any other options 

Alter Database [xds] set emergency
DBCC CHECKDB ('xds',REPAIR_ALLOW_DATA_LOSS);
GO

That finally mounted the xds database.

Once that finally mounted, the Lync services on the SBA started successfully.

Leave a Reply

Your email address will not be published.