Writing this blog serves two purposes. First, I love having the opportunity to play unified communications professor. I have been in this industry for a long time and I truly enjoy sharing what I’ve learned about telephony, SIP, WebRTC, and VoIP in general.
Second, I find that I don’t really understand something until I am forced to explain it to someone else. I cannot tell you how many times I’ve started writing an article, got stumped about how to say something, did some research, and found that what I thought I knew was either inaccurate or not complete. So, as much as I hope that I am helping my readers, I am helping myself just as much.
Today, I would like to tackle one of those “do I really understand this well enough to explain it to someone” subjects – Avaya media gateway survivability.
Are you ready? Great! Let’s go.
Avaya Media Gateways
Avaya supports two styles of media gateways. The first, and oldest, is the G650. The G650 is an 8U high, 14 slot chassis that was developed to give new life to Avaya TN card types. These are the cards used in the much older MCC and SCC cabinets. Examples include the TN799 C-LAN, TN224 digital line card, TN2602 IP Media Processor (DSP resources), and the TN2312 IP Server Interface.
These gateways communicate with a Communication Manager (CM) server through the TN2312 IP Server Interface. Affectionately known as an IPSI, this card provides the control link between the CM and the gateway. A single G650 can support multiple IPSIs for redundancy purposes.
Traditionally, IP stations and trunks connected to a C-LAN. Like the IPSI, there can be multiple C-LAN cards in a single G650. These cards are used for both redundancy and capacity. You can configure different sets of phones to connect to different C-LANs.
C-LANs can also be used to connect to what I really want to write about today – H.248 gateways.
The H.248 gateway family consists of the G700, G250, G350, G430, and the G450. Of those, only the G430 and G450 are available for purchase today. The rest have been end-of-sale for a number of years.
Like the G650, the H.248 gateways support a variety of line cards and DSPs. However, these are newer vintage cards such as the MM710 T1 interface and the MM711 analog card. H.248 gateways do not support the TN form factor cards.
Also, H.248 gateways do not use C-LANs or IPSIs. Instead, these gateways connect to a CM server via a C-LAN in a G650 or directly to the CM through something called Processor Ethernet (PE). Think of PE as the network interface and IP address of the CM.
A great companion piece to this one:
The Steps Involved in Booting an Avaya SIP Telephone
Media Gateway Lists
Now that I have the basics out of the way, I want to spend some time explaining how an H.248 gateway determines which CM to connect to.
I’ve mentioned “the CM server” a few times, but that’s not quite accurate. There can be several CM servers and the gateways need to know which one they should hitch their wagon to.
There will always be a prime CM server. This is the main brain that runs the entire system. Call processing, vectors, call center, routing, and device management all live there
What happens if the main brain dies? No problem. Avaya allows gateways to failover to another brain called an Enterprise Survivable Server (ESS). Under sunny day conditions, an ESS is running, but it will not perform call processing or other CM tasks until a gateway registers to it. At that point, it wakes up and functions as if it was the prime processor.
Although Avaya supports up to 63 ESS processors in a single system, most enterprises implement far less than that.
There is another form of brain called a Local Survivable Processor (LSP). An LSP was originally designated to be a server that provides survivability for a branch location, but over the years, Avaya has increased its capacity and scale to the point where it now looks like its ESS brother.
How does an H.248 gateway know who to talk to?
This is where the Media Gateway Controller (MGC) list comes in. The MGC list instructs the H.248 gateway which processors it can connect to, in which order, and under what conditions.
For example, a MGC list might consist of the IP addresses of the main CM’s PE, a C-LAN associated with that CM, an ESS, and the 8300D processor imbedded within the gateway itself. These IP addresses are priority ordered and the gateway attempts to register to them in the order that they are listed….sort of.
If the main processor’s PE or C-LAN doesn’t immediately respond, you might want to hold off trying the ESS or LSP. It would be smarter to attempt registration to the main processor a few times before entering into disaster recovery mode. You don’t want a brief network hiccup to be the cause of a major reconfiguration.
This is where the transition point (TP) comes into play. The TP separates the primary server(s) from the survivable servers.
An H.248 gateway will first attempt to connect to the processor(s) above the TP. If my example had a TP of 2, the PE and C-LAN of the main CM will be tried several times (with 10 seconds between each attempt) before the gateway decides that they are not going to answer.
Honestly, I would love it if there were two transition points. One would divide main from ESS and the other would separate ESS from LSP. This allows me to create a policy for enterprise survivability and another policy for local survivability. What say you, Avaya?
So, how long does a gateway keep trying? Every gateway will attempt to register to an IP address above the TP until the primary-search time is reached. After that, it attempts to connect to the servers below the TP.
With a TP of 2 and a primary-search of 10, a gateway will cycle through the two IP addresses of the main CM (PE and C-LAN) for 10 minutes before deciding it’s time to move on. At that point, it will try any IP addresses below the line following the same rule of 10-seconds between each registration attempt.
There is another value we need to be concerned with. The total-search time is the maximum number of minutes a gateway will attempt to register itself before giving up and rebooting. This time includes attempts above and below the TP.
Making it real
Unfortunately, there is no centralized way to configure this. You make the magic work by setting the configuration parameters on every gateway in your Aura system.
The commands to create an MGC list similar to my example will look something like this:
clear mgc list
set mgc list 10.100.4.63 10.100.4.12 10.100.4.103 10.100.4.203
set reset-times transition-point 2
set reset-times primary-search 10
set reset-times total-search 15
In words, we have this:
Main Server (Processor Ethernet)
Main Server (C-LAN)
ESS Location (Processor Ethernet)
LSP (Processor Ethernet)
That wasn’t too difficult, was it? Heck, even I learned something today. I wasn’t sure about the 10-second timer until I did a little research. So, even if you are still scratching your heads, it was a truly satisfying experience for me.
Until the next time….
Andrew —- an additional Survivability feature I think worth mentioning for the G450 would be it’s Network Port Redundancy which would protect it against those pesky switch-port failures.
On top of the few minutes spent configuring what Andrew has already mentioned above……. these additional commands entered via CLI will do the trick;
set port redundancy
set port redundancy enable/disable
set port redundancy-intervals
show port redundancy
Access to additional information related to Network Port Redundancy can be found on the “Administration for the Avaya G450 Media Gateway” document published by Avaya.
Excellent point, Alex! I knew there was more to say about the G450/G430, but I had to stop somewhere. My goal is to always leave enough unsaid for yet another blog article. 🙂
What about SLS?! We are actually in a situation where that is our only option at over 40 locations until we get the funds to buy LSP’s for those locations. Avaya has abandoned the Provisioning and Implementation Management (PIM) server that would handle the automated configuration and updates to the G4X0 gateways. Now we have to manually configure SLS on each gateway everytime an extension or trunk is changed. Is anyone else actually using SLS?
We also just had Avaya just set auto-recovery for all of our media gateways so that hopefully SLS in all its limited glory never has to be used. This may be a key configuration point for people to consider as they decide which failure scenarios they want to solve for in their recovery rules. For us we have redundant WAN links and other equipment so any WAN outage should only be a minor blip so we would prefer the gateways auto recover after everything is stable for at least 5 minutes.
Here’s a bit of that config from the CM side:
display system-parameters mg-recovery-rule 1
SYSTEM PARAMETERS MEDIA GATEWAY AUTOMATIC RECOVERY RULE
Recovery Rule Number: 1
Migrate H.248 MG to primary: immediately
Minimum time of network stability: 5
As usual, very informative and helpful guide. Thanks.
We have a G430 two weeks back there was failure in it’s fan and now the MM716 cards are not producing any dialing tone in them but the gateway shows it is registered