Understanding Lync 2013 Server Failover

2012/07/26

Lync 2010 introduced the concept of a backup registrar.  In this scenario, within topology builder, you have the ability to define a backup register and publish that information into the topology.  This was vital for not only the Survivable Branch Appliance but gave users the ability to fall into limited functionality mode in the client.

What happens is the client, upon login, is given both the primary and backup registrar pool during login with the q=0.7 and q=0.3 values – where q=0.7 is the primary server and q=0.3 is your backup server.

When client services stop, the client would go into limited functionality mode.  When the client is in this mode, Enterprise Voice functionality would work, but features like configuring Sim Ring, Conferencing and other services that relied on the back-end database are no longer available.

This was very powerful and allowed Lync to be survivable for enterprise voice features.  The problem was, users would lose access to their contacts, conferences wouldn’t work, response groups would not work and bringing these services back online on the far side required significant work to import contacts using the old dbimpexp.exe tool.

Lync 2013 changes all of these behaviors and makes survivability in Lync more competitive with other PBX vendors.

According to the “What’s new article” (http://technet.microsoft.com/en-us/library/jj204892%28v=ocs.15%29)

As in Lync Server 2010, the main high availability (HA) scheme for Lync Server 2013 Preview is based on server redundancy via pooling. If a server running a certain server role fails, the other servers in the pool running the same role take the load of that server. This applies to Front End Servers, Edge Servers, Mediation Servers, and Directors.

Lync Server 2013 Preview adds new disaster recovery measures by enabling you to pair Front End pools located in two datacenters. If one of the paired pools goes down, an administrator can fail over the users from that pool to the other pool in the pair, to provide continuation of service.

Lync Server 2013 Preview also adds Back End Server high availability. This is an optional topology in which you deploy two Back End Servers for a Front End pool, and set up synchronous SQL mirroring for all the Lync databases running on the Back End Servers. You may choose whether to deploy a witness for the mirror.

Understanding these is important to understand the functionality of server failover.

#1 is a recap of the feature we have today in Lync 2010.  It tells us that when a server within a pool fails, other servers will pick up it’s place.  For example, failure of front-end server #1 could push the AVMCU features that were hosted on front-end server #1 onto front-end server #2.

#2 is the first look at a new feature of Lync Server 2013.  And unfortunately, we are weeks into the public preview of Lync 2013 and I’ve already heard people miss handle the explanation of this feature.  So what is this exactly?  In Lync Server 2013, when you specify a pool as a backup registrar to another pool, this does a lot more than it used to.  image_thumbFirst off, configuration of this option is exactly the same as before within topology.  Here we can see the same configuration option as before.  We can specify automatic failover and failback for voice and again specify the time limits.  This is where items start to change though.  If we take a peak at the services installed on the server, we see something completely new.  The Lync Server Backup Service.  It is important to note, that you must run bootstrapper on the server after you specify a backup registrar.

image_thumb1

So what exactly is this service doing?  According to TechNet, “Lync Server Backup Service provides real-time data replication to keep the pools synchronized”.  So what data are we talking about here?  Pretty much everything.  User information, contacts, conferencing data and more.  Pretty much everything in the database (with the exception of response group information).  What happens if the front-end service is turned off on each server?  And here is where the early reports are slightly confusing.  Some have assumed that if services are offline on one pool, users will automagically failover with everything.  That isn’t quite the case.  Users will failover but have the same limited feature set that they had in the previous version of Lync.

 

So this should look very similar to anyone who has used Lync previously.  So now our users have a similar set of features.  I can IM and make/receive phone calls but items like conferencing still aren’t available.  So what does #2 of the new features of Lync 2013 exactly tell us.  An administrator can declare an emergency and fail over the pool to the backup pool.  That is done by using the:

Invoke-CsPoolFailover –PoolFQDN –DisasterMode -Verbose

image_thumb2image_thumb3

There is one important note about the above command.  The –DisasterMode is required if the pool (or front-end services) are down.  If you leave that off, you will receive an error in the Management Shell.  The great thing here is this, you could use this command to force a failover to a far side pool so you can perform maintenance on all of the servers.  When the command is completed, the Lync Client will automatically refresh with client and take it out of Limited Functionality Mode.  Contacts, groups, and everything else will come back as they were before.

So what have we learned thus far?  Just because Lync 2013 includes this exciting new feature for DR purposes, there is still some sort of intervention required.  Personally, this is a great decision, because we don’t want users simply flipping between servers because of latency issues.  Second, this completely changes how HA/DR are done today.  How about this HA/DR scenario.  You are a small business and you want to implement some sort of HA/DR.  In the past, you needed to spin an enterprise pool, clustered SQL and much more.  Now, you can spin two Standard Edition servers and point them at each other for failover.

So what happens when your servers are back online and you want to fail-back.  Again, we do this through PowerShell:

Invoke-CsPoolFailback –PoolFQDN –Verbose

image_thumb4

Now, one last interesting item, the failback also has the –disastermode option.  Not 100% sure where the use case for this would be.  Maybe you failed over to the backup pool but not that pool has failed and you are going back to your production pool that is back with all of it’s data?  I guess it could happen, but if you were suffering that much failure I have a gut feeling you have other issues.

Post Directory