A big part of my job is working with companies on SIP migrations. Some are looking to dip their toes in the water with a SIP application or perhaps a few trunks, but several want to do it all – applications, trunks, and users. Every engagement is unique, but they all have one thing in common. Every company has a particular tolerance for risk and unless I understand what it is, it’s impossible to help them come up with the right solution.
As much as I love the technology I work with, I realize that nothing is perfect. Power supplies and fans fail. Software crashes. Humans do stupid things like accidentally shutting off the power or unplugging the wrong cable. So, as I help design a solution, one of my first questions is, “How often can this go down?” Of course, the most common answer is “Never,” but after we talk things through, we come around to the number of seconds, minutes, or hours of downtime they are willing to tolerate. That will then lead to the design that best meets their needs.
Today, I would like to spend a little time talking about risk management and session border controllers. In the vast majority of cases, an SBC is your company’s SIP portal to the outside world. It can handle both SIP trunks and remote SIP users. For instance, at my company, the SBC is where we connect our CenturyLink SIP trunks along with our remote Avaya Flare and One-X Mobile users.
The type and configuration of the SBC plays a big role in determining potential downtime. Additionally, the type of traffic you run through that SBC is an important factor in risk management. For example, the director of a company’s contact center probably considers his or her SIP trunks mission critical. At the same time, that same company might not be all that bothered if its remote users lose access to the network for several minutes or even a few hours. In that case, it might make sense to split trunks from users by using distinct SBCs. This way the less important users will have no impact on the crucial trunks.
Most SBCs can be configured in two different manners. First, you can configure an SBC as a standalone box. You point your external SIP traffic to that box and if it goes down, you either manually route traffic to a different box, or you suffer until the box is online again.
Some standalone SBCs offer a level of resiliency with redundant, hot-swappable power supplies and fans. Some support duplicated processors and network interface cards (NIC).
Note that not all SBCs offer box-level resiliency. Please check with your vendor for details on their offering.
The second configuration is to make the SBC high available (HA). This requires two separate SBCs and sometimes a third server for management functions. The first SBC will run in active mode while the other runs in standby. The two SBCs share the same public and private IP addresses, but those addresses will only be used by the active box. As it runs, the active SBC will share call state information with the standby box. In the case of the active box failing or being taken offline, the standby box goes active and acquires the IP addresses. All SIP traffic now flows through the newly active SBC. Since the standby SBC was aware of all call states, SIP sessions will be preserved during failover with no loss of signaling or media.
There are two things to make note of. First, not all HA SBCs preserve active calls on failover. As with box-level resiliency, ask your SBC vendor what type of call preservation their box supports.
Second, HA pairs require Layer-2 separation. This means that they must be on the same subnet to act as an active/standby pair. Keep this in mind since most geo-separated data centers are Layer-3 (different subnet) separated. You cannot split an HA pair between Layer-3 separated data centers.
There are other aspects to consider when it comes to SIP resiliency. For more thoughts on this, please see my blog, Building a Resilient SIP Solution.
I hope this helps you in your quest to build a rock solid SIP communications systems with no single points of failure. As always, please feel free to reach out to me with any questions or comments you might have.
Hi Andrew – The other way of building resilience which I have now started to build is using software SBCs running in VMWare ESXi or WIndows HyperV. This gives some pretty funky HA options – with single instance actual highly available due to the nature of the Virtualisation technology being able to live fail over the virtual machine to another physical node. This works both within and across data centres.
It gives be a new tool to build highly available voice solutions. Its the way forward – just about everyone in the telco world is getting excited about virtualising their network functions in this way.
Absolutely, Neill, and I should have mentioned it. In fact, you and I had a discussion about this a while ago when I got all excited about Sonus’ new virtual SBC. Virtual communications components are changing so much about we design systems. Now, if only we could get rid of those last TDM pieces…. 🙂
The SWe seems to be getting there. First release doesn’t include one important piece of functionality…. SIP Registration (UAC) towards the carrier SIP trunk. Just waiting for a new release of software…. Yes may be the TDM pieces of the puzzle will go soon. The TDM world has been with us for a while and has worked pretty well for us, but I too think its time for it to go. The carriers are getting there over on this side of the pond (but it is slow progress). 😦