Over the past few months I’ve written a lot about the signaling aspects of IP communications and although I’ve mentioned media as the result of a SIP session, I haven’t really gone into much detail about the different types of media. Well, today I plan on rectifying that and spending some time on audio as a media type.
Before I go into the different media codecs I need to lay down the groundwork. For instance, what is that codec thing I just mentioned? Simply put, a codec is a CODer and a DECoder used to convert analog media to packetized IP and vice versa. In other words, a codec can take human speech, convert it to a stream of IP packets, and then eventually convert those IP packets back to something the human ear can hear. As part of that process you need transducers. A microphone is a transducer that turns sound into electrical signals and a speaker is a transducer that takes electrical signals and turns them back into sound.
When it comes to IP communications there are a plethora of audio codecs to choose from. Each one has its strengths and weaknesses. Some codes are designed to accurately reproduce voice and aren’t concerned with the number of bits it takes to do that. Others are designed to be as efficient as possible bit-wise while delivering acceptable voice. The codec that you use is dependent on the type of experience you want to create given the parameters of your network and the processing power of you communications devices.
It would take pages to cover every codec that’s out there, so I will stick with the ones that you will most commonly encounter.
G.711
G.711 is a very common codec and has been around since 1972. G.711is what a traditional telephone calls sound like. In fact, it is commonly referred to as toll quality voice. You will also hear G.711 called Pulse Code Modulation (PCM). This is the technical way of saying 8-bit non-uniform quantization with 8000 samples per second which I guess is even more technical than saying PCM. The most important things to know are that G.711 consumes around 90 Kbps (Kilobits per second) of network bandwidth for a single call and it sounds pretty good. Unless there are network problems, people do not complain about G.711 voice calls. It’s what they’ve been used to for the past 40+ years.
G.729
While you often see G.729 written just as I did, that’s not technically accurate because nobody implements it that way. You see, there are a lot of different flavors to G.729 and each one is slightly different from the other. The two flavors, or annexes, that you will commonly see are G.729A and G.729B. Of these I will take G.729A over G.729B any day. G.729B employs something called silence suppression which causes problems when you are very quiet talker. Instead of suppressing just the silence on a telephone call, G.729B will suppress the voice itself leaving you with a very choppy, or clipped, conversation.
Every annex of G.729 will require less bandwidth than G.711. Typically, G.729 will use 32 Kbps of network bandwidth per call. This means that you can get about four times as many G.729 calls on a network connection than you can if those calls used G.711. The voice quality of those G.729 calls won’t be nearly as good as those that use G.711, but if your concern is reducing bandwidth usage then it’s a perfectly acceptable choice.
Lastly, G.729 codecs cannot transmit DTFM (telephone touch tones). For that you will need to use G.711 or an out-of-band transmission mechanism like RFC 4733 (formerly RFC 2833), but that’s a blog for another day.
G.726
Like G.729, G.726 is a compressed codec that uses significantly less bandwidth than G.711, but produces voice quality similar to that of G.711. For you technical people out there, G.726 uses something called Adaptive Differential Pulse Code Modulation (ADPCM). The most important thing to know about ADPCM is that it uses the differences between voice samples to create its media stream. In other words, instead of sending information about each voice sample, it will send one full sample followed by how the next samples differ from that one. That lowers the bandwidth required by G.726 down to 55 Kbps for each call. So, not as low at G.729, but less than G.711 with comparable voice quality.
G.722
G.722 takes the opposite approach of G.729 and G.726. Instead of focusing on lowering bandwidth usage, G.722 is concerned with improving voice quality. G.711 may be called toll quality voice, but it was invented quite a long time ago (heck, I was still in high school) and with bandwidth becoming cheaper and more plentiful why not make a voice call sound better than it did in 1972? That’s exactly what G.722 does. Instead of that 8000 sample rate of G.711, it doubles it to 16,000 samples per second. Because G.722 also uses ADPCM technology, the bandwidth usage isn’t double that of G.711 even though the sample size is. A typical G.722 call consumes about 90 Kbps per call.
G.722 is still fairly new in the world of codecs, but it or another “wideband audio codec” will most likely replace G.711 in the not too distant future.
There are more audio codecs out there, but I will stop with these four since they are the ones you will mostly likely run into. However, stay tuned for a further look at codecs where I will tackle such beasts as ILBC and Microsoft’s RTAudio .
Love your blog Andrew! I learn something new with every post.
Thank you, Melissa! I sometimes feel as if I am writing to a vacuum so I really appreciate knowing that at least one person is learning something. 🙂
Excellent and informative post Andrew. I Wally enjoyed refreshing me memory. To often today we loose the fundamentals of our business. I think it’s great that you recognize this and post the content.
Thanks, Mark. I am glad you enjoyed it. I hear you will be here in Bloomington on Thursday. It will be great seeing you again.
I see this is a pretty old thread, but still very useful to techies who want to understand pros/cons of certain options for CODECS, etc. I started in IT and Telecom related fields many years ago working for a cellular provider but was too young at the time to relate my knowledge (mostly about DTMF) to the real world of Telecom, in addition i’ve been a musician since my middle school days, and so PCM, Sample rates, and such are no new thing to me, yet i learn more and more about them in my current career as a voice tech.
The things you learn on the inter-web : )
Thanks, Mike. I am an old telecom guy, too, and realize that there is still a lot to this new VoIP stuff that I don’t know. Half the reason for this blog is to force me to learn something new on a regular basis. 🙂
I hope you stick around. I have lots of articles about all sorts of things.
Andrew- This is the best SIP/VoIP telephony blog I’ve encountered yet… it distills this ordinarily tough material into entertaining and very informative stories and learning opportunities. You’ve helped me learn things that other books failed to do. Well done…
Thank you for saying that, David!
Hi Andrew,
Its simply amazing how you explain complex concepts in a simple plain english. Thank you so much for all your time and efforts. This is my one stop for all SIP/VOIP concepts.
Thanks!!!
Very Nice Explanation ,Enjoyed reading it , Please confirm why do we negotiate 8K sampling rate for G722 in SDP , though it is 16 KHZ codec
Where are you doing that?
X-lite, Microsip and all other soft clients negotiates G722 codec in SDP as a=rtpmap:9 G722/8000 but G722 is 16 KHZ audio codec,
The explanation is found here. https://en.wikipedia.org/wiki/G.722
Hi Andrew,
Thank you very much, i really learn from your posts.
Could you explain why sometimes G 729 came with specific parameter “annexb=no or yes”, what this parameter tends for?
It has to do with silence suppression. Read about it here: https://en.wikipedia.org/wiki/G.729