SIP for Beginners

I realize that I sometimes write about the SIP and SIP services as if everyone in the world has the same background and experiences as me.  That isn’t meant as a brag.  There are lots of important things that I haven’t a clue about.  I do know SIP, though, and it’s essential that I write with the understanding that my audience consists of people from all walks of the communications life.

To help things along I’ve put together this quick SIP primer.  It’s not meant to be exhaustive, but I strive to cover what I think is important.  Some of these things I have already written about in previous blogs and I intend to dig deeper into the remaining topics over the next several weeks and months.

Session Initiation Protocol

The Session Initiation Protocol (SIP) is a signaling protocol used to establish, modify, and tear-down communication sessions in an IP network.  These sessions can be as simple as a two-way call or as involved as a multi-party web conference complete with audio, video, and a shared whiteboard application.

SIP was modeled after the Hypertext Transfer Protocol (HTTP) and contains many of the basic tenets of that protocol.  First, SIP is an English-like, text-based protocol that is not only easy to read, but is also easy to understand, debug, and extend.  New features can be added to SIP without the need to modify any of the SIP server entities that might exist within any particular call path.

Second, and perhaps most importantly, SIP is media agnostic.  In other words, SIP can be used to establish sessions of nearly any media type imaginable.  As communications move well beyond that of a simple phone call, SIP is fully equipped to support any and all media (voice, video, instant message,  SMS text, etc.) that might come along.

Additionally, SIP has been extended to allow for first and third-party call control.  This means that SIP can be used by one entity to control the call flow of another entity.  In its most basic sense, this means that applications can be written that direct endpoints to create (e.g. make a call), manage (e.g. answer an incoming call), and terminate sessions (e.g. release call).

To learn about the different SIP components, please refer to SIP Servers and Services.

Rich Communications

In the same way that HTTP allows a web browser to deliver a wide variety of content types to a PC or web-enabled device, SIP-enabled devices can support media from many different sources.  Built into SIP is the notion of session description which allows SIP to establish a session independent of the underlying media stream.  This allows for session escalation whereby a user might start communicating with an instant message and then later on add voice, file transfer, and multi-party video.  SIP has been designed to support any communications means that a user may require.

SIP is perfect for dealing with the explosion of consumer-grade communications devices that are making their way into the enterprise.  Imagine a world where your personal iPhone or iPad can be securely integrated into your communications system.  With SIP that world exists and solutions are available today.

SIP Trunks

In traditional wireline telephony, phone calls are passed to and from an enterprise and the Public Switched Telephone Network (PSTN) over a dedicated line or a bundle of circuits.  These could be analog trunks such as loop or ground start lines or digital trunks such as T1, E1, ISDN, or PRI.  Since SIP is an IP protocol, it runs on the same network that data traffic runs on.  This convergence of voice and data means that a SIP trunk is a logical concept that has more to do with bandwidth than physical wires or circuits.

The benefits of SIP trunks over traditional trunks are many:

  • Converged voice and data
  • Rich communications
  • Equipment reduction which leads to reduced power and space requirements
  • Flexible costs due to burst pricing
  • Improved reliability and failover strategies

For a deeper dive, please see my A Guide to Implementing SIP Trunks.

User Centricity

Computer Telephony Integration (CTI) has traditionally been endpoint centric where applications controlled and monitored physical endpoints regardless of who might be using that device.  However, with the explosion of communications interfaces a user might employ numerous different devices throughout the day.  For instance, the manager of a sales department will typically have an office phone, a cell phone, a soft phone (a computer phone such as Avaya’s One-X Communicator or Microsoft’s OCS or Lync clients), and an instant messaging client.  With SIP, a single application can manage that user’s devices along with the presence status generated by those devices (e.g. on a call, in an instant messaging session, etc.) as a whole.  This user centric model is a break from the device centric model where each device is treated as a separate entity with no particular connection to its owner.

A related discussion can be found here.


In the same way that you would never allow a PC to connect to the Internet without the proper security tools such as a firewall and virus checker, Voice over IP (VoIP) requires protection from malicious activity.  SIP has a number of security mechanisms that are either built into SIP or work alongside SIP to create a rock solid means of defense.  For example, SIP itself can be encrypted and individual SIP messages can be challenged with authentication requests.  SIP media streams can also be encrypted to prevent preying eyes and ears.  Finally, SIP-based components such as a Session Border Controllers (SBC) can be deployed as a perimeter defense appliance similar to how an enterprise would deploy a network firewall.

For a deeper dive, please read  my articles  on Practicing Safe SIP and Choosing the Right SBC.

Why Not H.323?

Since the early 2000’s, IP telephony was built around the H.323 standard.  H.323 is a feature-rich protocol that allowed the IP PBX to deliver a user experience identical to that of older digital telephones.  However, H.323 is not media independent like SIP and cannot be extended beyond voice and video.  So, while H.323 a powerful protocol in terms of delivering traditional voice features, it is not capable of being extended to support the multimedia requirements of the modern enterprise.  As the communications expectations of employees and customers evolve, H.323 devices will be supplanted by SIP devices which are purpose fit for today’s world of smart phones, mobility, and context aware devices.

I take this discussion a little further in Intelligent SIP Endpoints.

That’s probably enough for now.  If I haven’t lost you, please keep coming back for more.  I still have a lot to say.


  1. Andrew Prokop · · Reply

    Thanks for the re-blog!

  2. Great intro artical.

    1. Andrew Prokop · · Reply

      Thanks for reading and commenting, Joe!

  3. Alex Flor · · Reply

    Fantastic blog. Thanks !

    1. Thank you, Alex!

  4. I’m starting UC project in my organization these days and have no former knowledge in the telephony and UC world. I read a lot of materials trying to learn and get more familiar with this field. This short well-written article brings things together and clears some of the confusion. Thanks very much for sharing this knowledge.

    1. Thanks, Li! I am happy you enjoyed it. You will find lots of articles on my blog that will take you even further.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: