The Session Initiation Protocol (SIP) is a signaling protocol that specifies the syntax and transfer procedures of messages that are meant to establish sessions.  It is usually used to establish media sessions (audio and/or video) between endpoints on an IP network.  The SIP protocol lists many fields of information that might be needed when establishing sessions, and describes how each field should be encoded in a message.  It assigns roles to endpoints that handle these messages, and describes what messages should be sent when.  The Internet standards RFC for SIP is 3261.

Two of the roles defined by SIP are ‘user agent’ and ‘registrar’.  User agents are any endpoint that might participate in a session.  Registrars are used to help user agents find each other by using usernames or phone numbers, without knowing IP addresses.  Usually when a registrar is used, all SIP messages are between a user agent and a registrar, and the registrar forwards the messages to the correct recipient on behalf of the sender.  The SIP message format is based on HTTP, and each message may carry a payload.  A common payload is an SDP message which tells each endpoint what codec to use for each media stream, and where to send the media.

The following example shows a common SIP scenario where two user agents establish a direct media connection with help from a registrar:

SIP protocol diagram