I’ve written quite a few blogs where I mention that SIP media is sent by something called RTP, but I’ve never described what that means. Well, today I’ve decided to do something about that.
RTP stands for Real-Time Protocol and like the bulk of the standards used by and with SIP, it is managed by the Internet Engineering Task Force (IETF). Like all IETF protocols, RTP has its own RFC –RFC 3550. It’s actually a fairly easy RFC to read and comprehend and I invite you to do so, but I think that over the next several paragraphs I can tell you just about everything you really need to know about RTP.
RTP was developed as a way to deliver real-time media across an IP network. For the most part, that real-time media will be either voice or video. One of the cool things about RTP is that it is used by both H.323 and SIP. In other words, it’s possible for an H.323 client to communicate with a SIP client as long as you have something in the middle to transcode between the two signaling protocols. The media streams wouldn’t require transcoding because they are exactly the same.
The protocol itself is quite skinny. In fact, an RTP header can be as small as 12 bytes. The aspects that you need understand are the following:
Sequence Number: The sequence number is used to put an identifying number on each RTP packet sent. The sender will increment the number by one for each new packet. RTP is sent on an unreliable, datagram protocol (e.g. UDP) so there are no retransmissions of lost packets. However, the sequence number can be used to learn if a packet has been dropped by the network, or arrives out of order.
Timestamp: The timestamp is used to allow the receiver to play back the packets at the appropriate intervals.
Payload Type: This seven-bit value describes the protocol carried by RTP. For instance, this is where G.711, G.729, or H.264 would be indicated.
RTP Payload: This is the media and the amount of data sent is dependent on the codec and sample interval. For example, it might be 20 bytes of G.729 when used with a 20 ms voice payload size. G.711 with that same sample size of 20 ms would yield 160 bytes of data. The important thing to realize is that any codec’s data (G.729a, G.711, iBLC, etc.) will be contained here.
For an in-depth explanation of payload size, please refer to this article.
RTP has a sister protocol. Real-Time Control Protocol (RTCP) is periodically sent with an RTP stream to transport control and QoS information. RTCP can tell you how many packets were sent and what the jitter and latency values are. RTCP packets might help you find voice or video quality issues in your network. For more information on QoS, please refer to my blog No Shirt, No Shoes, No Quality of Service.
Both RPT and RTCP can be encrypted. SRTP (Secure RTP) prevents the bad guys from sniffing your network and capturing your conversations. While not nearly as important to security, SRTCP (Secure RTCP) hides the QoS information about those calls. For more about security, please refer to my blog Practicing Safe SIP.
There are a few more odds and ends involved with RTP, but this is pretty much all you need to know to be dangerous. In the SIP class I teach I have my students gather RTP packets with Wireshark and playback voice calls. Of course, they couldn’t do that if the calls were established with SRTP, but that’s the whole point of security, isn’t it?