Dissecting a SIP Conference Call

Every few months, I teach a two and a half day class on all things SIP. I cover every request and response messages, most of the headers, and the students use Wireshark with a SIP softphone to do in-depth call flow analysis. These flows include basic and sophisticated telephone calls, presence, and instant message. I even take my students through the joys of deciphering, IP, UDP, TCP, RTP and RFC 2833.

This combination of lecture and labs work together to reinforce what might be an otherwise impossible subject to learn in such a short amount of time. I love seeing the light bulbs go on after I show them SIP REFER on PowerPoint slides followed by transferring a call and seeing the messages come to life.

However, there is one lab that has stumped nearly every person who has done it – Conference. While the idea of creating a three-party call is nothing new to my students, with the exception of one student over the many years I’ve taught this course, they are all baffled by what they see in Wireshark. It’s especially frustrating to them since my conference lab uses nothing more than a series of extremely basic SIP methods and responses – INVITE, 100 Trying, 180 Ringing, 200 Ok, and ACK. Pretty simple stuff, right?

I need to point out that my labs use SIP soft clients operating in a point-to-point manner. There is no proxy server and the clients send SIP messages directly to other clients. In other words, this is a teaching configuration and not meant for real life communication.

Let’s take a look at an example of one of these labs to see if you, dear reader, can figure it out. Will your light bulb turn on before I give you the answer? I hope so.

The lab consists of one student calling another student, pressing the conference button, calling a third student, and finally pressing the add button to join all three parties.

In my example, Lori calls Kevin. Lori presses conference and calls Mike. After Mike answers, Lori presses add to create the conference. All Wireshark traces come from Lori’s PC.

In Wireshark, I see two calls after formatting for VoIP Calls. This makes sense since Lori called both Kevin and Mike.

Selecting the first call and pressing Flow, I see this:

Selecting the second call and pressing Flow, I see this:

Let’s analyze what is happening. Again, I am not surprised to see two calls. Lori called Kevin and then Lori called Mike. These calls create two SIP sessions and Wireshark is correct in how it displays them.

However, it’s difficult to see what is really happening when the calls are shown separately. Fortunately, Wireshark allows you to simultaneously select both calls and create a combined flow that displays messages in the order they were sent and received.

The combined flow looks like this:

As I pointed out earlier in this article, my soft phone implements conference using only basic SIP methods and responses. However, you may have noticed that there are two distinct flow types. The first consists of INVITE, 100 Trying, 180 Ringing, 200 Ok, and ACK. The second consists of INVITE, 100 Trying, 200 Ok, and ACK. Notice the absence of 180 Ringing in the second flow type.

Perhaps I need to show an INVITE message of each flow type to help you understand why they are different. Specifically, I need to show you the SDP of the two INVITE messages

The SDP for flow type 1 looks like this:

The SDP for flow type 2 looks like this:

The two SDPs are identical with one exception. The second contains a media attribute of sendonly. This is how this particular SIP soft client indicates that it is putting the call on hold. Personally, I prefer seeing an SDP Connection value of c=0.0.0.0 for hold, but I didn’t write this application and it is what it is.

This means that INVITE type 1 creates a new call and INVITE type 2 puts a call on hold. In other words, the type 2 INVITE is what we in SIP land call a RE-INVITE. The fact that there isn’t a 180 Ringing in type 2 should be a big clue that this is not a call creation.

I don’t blame you if you are confused as to how this series of INVITE transactions implements conference. That’s because I need to tell you one more thing. In addition to the second INVITE type being used to put a call on hold, it’s also used to take a call off hold. It does this by sending the exact same SDP minus the sendonly attribute. This restarts the flow of audio packets.

Logically, this is the entire call flow:

Lori calls Kevin.
Kevin answers the ringing call.
Lori presses the conference button.
The call between Lori and Kevin goes on hold.
Lori calls Mike.
Mike answers the ringing call.
Lori presses the add button.
The call between Lori and Mike goes on hold.
The call between Lori and Kevin goes off hold.
The call between Lori and Mike goes off hold.
All three parties can now speak to each other.

I expect that steps 1 through 7 make sense, but 8 through 11 may still have you scratching your head. Am I right or do you see what is happening?

Pause now if you want to keep thinking before I present the answer.

The Envelope Please

Steps 8 through 11 are Lori telling Kevin and Mike to go off hold and send their media to her. This allows Lori’s PC to act as the RTP mixer for all three parties. Remember, this is a point-to-point configuration. There is no conference bridge that Lori can use to merge the media streams. She needs to do it all by herself without any outside assistance.

While the sound quality of this conference is decent, this is not a scalable solution. A single PC can mix audio for three participants, but that’s the upper limit. To go any further would require a dedicated bridge.

Well, that’s it. Did you figure it out before I gave you the answer? Don’t feel bad if you didn’t. Most people don’t.

What I like best about this lab is that it shows how SIP can be used to create something as complicated as conference without having a special conference command. By simply sending INVITE and BYE requests, SIP is able to add and drop people in and out of a three-party call. Call me a nerd, but I think that’s pretty cool.

Tags: Conference Call, SDP, Session Description Protocol, SIP, Unified Communications, VoIP, WebRTC

13 comments

Arie · November 18, 2014 - 1:58 pm · Reply→

I See 180 Ringing in flow #1 and flow #2
1. Andrew Prokop · November 18, 2014 - 2:06 pm · Reply→
  
  Yes, you will see Ringing in both flows, but not in all INVITE transactions. Notice how the ones that create a new call have Ringing, but the ones that put calls on and off hold do not.
  1. Arie · November 19, 2014 - 12:00 pm ·
    
    Thanks
Robert Johnson · November 18, 2014 - 4:52 pm · Reply→

I appreciate this Andrew! I loved the challenge.

I missed (or mis-interrupted) the part where the phone was to do all of the work.

I’ve done enough work in the real world to catch what was going on from steps 1-8, however, (based on my mis-interpretation) I assumed that the phones were initiating an audio stream directly to each other by using multiple Connection SDP attributes.

Is there a reason the phones could / would not do this?
1. Andrew Prokop · November 18, 2014 - 5:14 pm · Reply→
  
  Glad you enjoyed it! There is no reason why the phone could not do what you suggested, but my lab phone isn’t the greatest and has its own unique way of doing things. It does do point-to-point, though which is why I keep it. Everything else I’ve looked at wants to send out REGISTER messages and since I don’t have a SIP server in the class, I can’t use them.
  1. Robert Johnson · November 19, 2014 - 10:47 am ·
    
    I got to thinking about my question a bit further last night, A good, pratical reason why the phones wouldn’t do what I suggested is Lori has no way to know if Kevin and Mike can talk directly to one another.
palo73 · November 19, 2014 - 8:19 am · Reply→

Hi,
nice topic. Which softphones are you using within your class which allows direct calls without SIP servers?
thx
palo73
1. Andrew Prokop · November 19, 2014 - 8:36 am · Reply→
  
  Thank you. It’s an ancient soft phone from Avaya that they called One-X Desktop. I acquired a copy several years ago and keep it around just for my class.
Avi Perpinyal · November 20, 2014 - 5:09 pm · Reply→

What confuses me about your screenshots though is that Kevin seems to continue to send RTP to Lori, even after he has been put on hold (Looking at the Lori-Kevin ladder diagram t=43.759383). Is Kevin’s SIP stack not compliant with standards? Lori’s media attribute a=sendonly indicated to Kevin that he shouldnt send RTP because Lori is only sending, not receiving.
1. Andrew Prokop · November 20, 2014 - 6:27 pm · Reply→
  
  Send me an email at ajprokop@gmail.com and I will send you the full trace. That might help.
gautam · July 30, 2019 - 12:55 am · Reply→

Will call Id be different when Lori calls Kevin ; and when Lori calls Mike?

What is the way to correlate these 2 invites(one for kevin and other for Mike)? How to identify that these invites are part of the conference call since there are no special messages ?
ashish · January 17, 2020 - 6:06 am · Reply→

sir , how many dialog will generate for this call
Sarabjit · June 25, 2020 - 7:11 am · Reply→

Andrew,
thank you for this wonderful in for.

i have one question reagrding below (second ) invite :-

The two SDPs are identical with one exception. The second contains a media attribute of sendonly. This is how this particular SIP soft client indicates that it is putting the call on hold. Personally, I prefer seeing an SDP Connection value of c=0.0.0.0 for hold, but I didn’t write this application and it is what it is.

what will be the response of second invite , will it be same 0,0.0.0 ?