Every few months, I teach a two and a half day class on all things SIP. I cover every request and response messages, most of the headers, and the students use Wireshark with a SIP softphone to do in-depth call flow analysis. These flows include basic and sophisticated telephone calls, presence, and instant message. I even take my students through the joys of deciphering, IP, UDP, TCP, RTP and RFC 2833.
This combination of lecture and labs work together to reinforce what might be an otherwise impossible subject to learn in such a short amount of time. I love seeing the light bulbs go on after I show them SIP REFER on PowerPoint slides followed by transferring a call and seeing the messages come to life.
However, there is one lab that has stumped nearly every person who has done it – Conference. While the idea of creating a three-party call is nothing new to my students, with the exception of one student over the many years I’ve taught this course, they are all baffled by what they see in Wireshark. It’s especially frustrating to them since my conference lab uses nothing more than a series of extremely basic SIP methods and responses – INVITE, 100 Trying, 180 Ringing, 200 Ok, and ACK. Pretty simple stuff, right?
I need to point out that my labs use SIP soft clients operating in a point-to-point manner. There is no proxy server and the clients send SIP messages directly to other clients. In other words, this is a teaching configuration and not meant for real life communication.
Let’s take a look at an example of one of these labs to see if you, dear reader, can figure it out. Will your light bulb turn on before I give you the answer? I hope so.
The lab consists of one student calling another student, pressing the conference button, calling a third student, and finally pressing the add button to join all three parties.
In my example, Lori calls Kevin. Lori presses conference and calls Mike. After Mike answers, Lori presses add to create the conference. All Wireshark traces come from Lori’s PC.
In Wireshark, I see two calls after formatting for VoIP Calls. This makes sense since Lori called both Kevin and Mike.
Selecting the first call and pressing Flow, I see this:
Selecting the second call and pressing Flow, I see this:
Let’s analyze what is happening. Again, I am not surprised to see two calls. Lori called Kevin and then Lori called Mike. These calls create two SIP sessions and Wireshark is correct in how it displays them.
However, it’s difficult to see what is really happening when the calls are shown separately. Fortunately, Wireshark allows you to simultaneously select both calls and create a combined flow that displays messages in the order they were sent and received.
The combined flow looks like this:
As I pointed out earlier in this article, my soft phone implements conference using only basic SIP methods and responses. However, you may have noticed that there are two distinct flow types. The first consists of INVITE, 100 Trying, 180 Ringing, 200 Ok, and ACK. The second consists of INVITE, 100 Trying, 200 Ok, and ACK. Notice the absence of 180 Ringing in the second flow type.
Perhaps I need to show an INVITE message of each flow type to help you understand why they are different. Specifically, I need to show you the SDP of the two INVITE messages
The SDP for flow type 1 looks like this:
The SDP for flow type 2 looks like this:
The two SDPs are identical with one exception. The second contains a media attribute of sendonly. This is how this particular SIP soft client indicates that it is putting the call on hold. Personally, I prefer seeing an SDP Connection value of c=0.0.0.0 for hold, but I didn’t write this application and it is what it is.
This means that INVITE type 1 creates a new call and INVITE type 2 puts a call on hold. In other words, the type 2 INVITE is what we in SIP land call a RE-INVITE. The fact that there isn’t a 180 Ringing in type 2 should be a big clue that this is not a call creation.
I don’t blame you if you are confused as to how this series of INVITE transactions implements conference. That’s because I need to tell you one more thing. In addition to the second INVITE type being used to put a call on hold, it’s also used to take a call off hold. It does this by sending the exact same SDP minus the sendonly attribute. This restarts the flow of audio packets.
Logically, this is the entire call flow:
- Lori calls Kevin.
- Kevin answers the ringing call.
- Lori presses the conference button.
- The call between Lori and Kevin goes on hold.
- Lori calls Mike.
- Mike answers the ringing call.
- Lori presses the add button.
- The call between Lori and Mike goes on hold.
- The call between Lori and Kevin goes off hold.
- The call between Lori and Mike goes off hold.
- All three parties can now speak to each other.
I expect that steps 1 through 7 make sense, but 8 through 11 may still have you scratching your head. Am I right or do you see what is happening?
Pause now if you want to keep thinking before I present the answer.
The Envelope Please
Steps 8 through 11 are Lori telling Kevin and Mike to go off hold and send their media to her. This allows Lori’s PC to act as the RTP mixer for all three parties. Remember, this is a point-to-point configuration. There is no conference bridge that Lori can use to merge the media streams. She needs to do it all by herself without any outside assistance.
While the sound quality of this conference is decent, this is not a scalable solution. A single PC can mix audio for three participants, but that’s the upper limit. To go any further would require a dedicated bridge.
Well, that’s it. Did you figure it out before I gave you the answer? Don’t feel bad if you didn’t. Most people don’t.
What I like best about this lab is that it shows how SIP can be used to create something as complicated as conference without having a special conference command. By simply sending INVITE and BYE requests, SIP is able to add and drop people in and out of a three-party call. Call me a nerd, but I think that’s pretty cool.