Despite the fact that we’ve entered the holiday season, the weeks between Thanksgiving and New Years are proving to be some of the busiest of the year. Last week I was in Tampa and this week I travel to Salt Lake City and Phoenix. It doesn’t stop there, though. Next Monday, I fly to Detroit for a three day SIP engagement. While I am excited that people are that interested in hearing me speak, I am not thrilled about being away from home for half of December.
Okay, now that I’ve gotten that out of my system, let’s get on to today’s subject – a Wireshark view of Real-Time Protocol (RTP).
As I am sure you already know, SIP is a signaling protocol. While it is certainly responsible for establishing media connections, it is not itself a media protocol. It leaves that to Session Description Protocol (SDP) and Real-Time Protocol. SDP is used to describe media and RTP is used to transmit the media.
I previously touched on both SDP and RTP in these articles:
RTP is a datagram protocol that is nearly always carried in a UDP (User Datagram Protocol) packet. This means that RTP is an unreliable protocol. A sender sends an RTP packet without any assurance that the packet will ever be received. Unreliable also means that even if a packet is received by the far-end, the sender will never know if that packet was corrupted during transmission. It makes a best attempt to send it and hopes that it arrives. There are no retransmissions for lost or dropped packets.
Of course, it doesn’t make sense to retransmit real-time media. Once a voice or video stream has begun, you can’t go backwards in time. The receiver decodes and plays what it receives as it receives it.
Note: RTP isn’t limited to just SIP. H.323 also uses RTP for transmitting media.
Different codecs and sampling rates play a part in the number of packets that make up a voice or video conversation, but in all cases, there will be a lot of them. It takes as little as five SIP messages to establish a voice call, but that call might generate thousands and thousands of RTP packets. The longer the conversation, the more packets that are sent by all parties.
An RTP message includes the following parameters:
Sequence Number: The sequence number is used to put an identifying number on each RTP packet sent. The sender will increment the number by one for each new packet.
Timestamp: The timestamp is used to allow the receiver to play back the packets at the appropriate intervals.
Payload Type: This seven-bit value describes the protocol carried by RTP. For instance, this is where G.711, G.729, or H.264 are indicated.
RTP Payload: This is the media and the amount of data sent is dependent on the codec and sample interval. For example, it might be 20 bytes of G.729 when used with a 20ms voice payload size. G.711 with that same sample size of 20ms yields 160 bytes of data. The important thing to realize is that any codec’s data (G.729a, G.711, iBLC, etc.) will be contained here.
To better understand bandwidth requirements, please refer to these articles:
The following is an example of a 20 bytes of G.711 data send during a simple point-to-point audio call.
Wireshark makes understanding the packet extremely simple. It can even play back the RTP packets allowing you to recreate a captured conversation. Of course, this is because we haven’t encrypted the data with Secure RTP (SRTP). Wireshark cannot display or play SRTP packets.
That’s really all there is to it. You need a signaling protocol like SIP to establish a media connection, but RTP does the heavy lifting of moving digitized data between all the parties in a multimedia call.