Site home page
(news and notices)

Get alerts when Linktionary is updated

Book updates and addendums

Get info about the Encyclopedia of Networking and Telecommunicatons, 3rd edition (2001)

Download the electronic version of the Encyclopedia of Networking, 2nd edition (1996). It's free!

Contribute to this site

Electronic licensing info



Voice over IP (VoIP)

Related Entries    Web Links    New/Updated Information

Search Linktionary (powered by FreeFind)

Note: Many topics at this site are reduced versions of the text in "The Encyclopedia of Networking and Telecommunications." Search results will not be as extensive as a search of the book's CD-ROM.

In just a few years, the old circuit-switched voice-centric communications network will give way to a data-centric, packet-oriented network that seamlessly supports data, voice, and video with a high quality of service. The switching equipment, protocols, and links are already being put into place. A transition network is currently in place that joins the packet data world with the circuit-switched world. Integrated access solutions are being installed that support integrated data, voice, and other media into the Internet or the PSTN.

Despite a number of technological issues, real-time multimedia transmission (voice and video) over IP networks and the Internet has largely been worked out. Advanced compression techniques have reduced voice data transfer rates from 64 Kbits/sec to as little as 6 Kbits/sec. Voice over IP or VoIP can potentially allow users to call worldwide at no charge (except for the fee paid to service providers for Internet access). A user's IP address basically becomes a phone number. Additionally, computer-based phone systems can be linked to servers that run a variety of interesting telephony applications, including PBX services and voice messaging.

Internet telephony is breaking down the distinctions between telecommunications services and the Internet, sparking reevaluations of the entire industry from technical, economic, and regulatory points of view. The government is taking a hands-off attitude regarding growth of the Internet telephony industry. A few years ago, a Federal Communications attorney said the FCC wants to encourage growth of Internet telephony and had no plans to regulate it soon.

One of the best reasons to support packet telephony can be seen in the service limitations of the traditional telephone system. The switches are mostly proprietary with embedded call control functions and service logic. That makes it difficult to add new services. In addition, the end devices-telephones-are limited in functionality to a 12-key pad! In contrast, new services are easy to add in the IP telephony world because users simply add new telephony applications on their computers and communicate with other users who are running the same telephony applications. There is no need to put additional support in the network itself for these services-all the network has to do is transport packets with good quality of service.

The new model reverses the thinking of the old model. The old model was "dumb endpoints driven by an intelligent network." The new model is intelligent endpoints communicating over a relatively dumb packet-based best-effort network.

While phone calls on the Internet rarely achieve the level of quality of circuit-switched calls, the quality is improving; the connection is often as good as or better than cellular, which is prone to packet errors and distortion due to its wireless nature. The market has found that many customers will tolerate a small decrease in the quality of the call for the significantly reduced cost of an IP telephone call. Savings on fax calls may be even greater. In fact, transmitting fax over the Internet is very practical because real-time delivery is not required. Many service providers are building special networks designed to provide a high quality of service for customer multimedia applications.

Quality of IP telephony is affected by latency, which is basically the amount of time between when someone speaks and when the listener hears. People who talk across satellite links know about this gap and are often able to adjust to it. Jitter is the real problem. Jitter is variations in delay. People can get used to talking across satellite links because the delay is constant. You get used to the slight pause. But if the amount of delay changes as the call proceeds, the effect is annoying. When voice is transmitted over packet networks, jitter occurs when packets get held up in queues during unpredictable and momentary bursts of traffic. RTP (Real-time Transport Protocol) is specifically designed to smooth out jitter by synchronizing packets based on timestamps. Virtually all IP telephony applications use RTP.

The remainder of this topic covers new and emerging telephony protocols and models. Specific concepts such as voice compression, codecs, silence suppression, echo cancellation, jitter, and QoS are not covered here. You can refer to "Compression Techniques," "Multimedia," "QoS (Quality of Service)," and "Voice/Data Networks" for additional information.

The PSTN Legacy Architecture

It's important to understand the architecture of the legacy public-switched telephone network to understand the equipment, software, and protocols involved in VoIP integration. The situation for the next few years will be that the existing PSTN will slowly give way to a new public packet network. In the meantime, the existing PSTN must be leveraged because millions of users and non-IP devices are connected to it. In addition, it supports a variety of voice services such as 800 and 900 numbers. An Internet telephony device wanting to connect with one of these devices must use PSTN signaling. Here are the possible Internet/PSTN interconnection scenarios:

  • Internet user/device to PSTN user/device (packet to circuit)

  • PSTN user/device to Internet user/device (circuit to packet)

  • PSTN user/device to PSTN user/device across the Internet (circuit to packet to circuit).

  • Internet user/device to Internet user/device across the PSTN (packet to circuit to packet)

In each of these cases, some translation is required to convert from one signaling method to another. In the PSTN, signals are messages sent between telephony switches to set up and terminate calls and indicate the status of terminals involved in calls. These signals are carried over a separate data network known as CCS (Common Channel Signaling). The protocol used by CCS is SS7 (Signaling System 7). The entire system is called the IN (Intelligent Network). Refer to "Telecommunications and Telephone Systems" for a description of this network.

VoIP can be seen as part of the overlay network, but built with media gateways and media gateway controllers (also called "softswitches"). A softswitch could be called a "soft" SSP (service switching point). It transmits IP-based voice signals into the SS7 network and translates them so that the SCP (service control point) can act in the same way it responds to signals from SS7- compatible devices. In either case, the SCP sends appropriate signals to control the SSP.

VoIP Standards

When talking about standards, the first thing to mention is that the term "VoIP" was actually coined by the VoIP Forum to describe Voice over IP using ITU H.323 standards. However, "VoIP" has become a generic term for referring to the range of standards that support voice over IP. Those include the ITU standards, IETF standards, and others.

ITU H.323 has dominated the market in terms of the number of devices installed. H.323 is a multimedia conferencing standard that is quite complex. While it works well for videoconferencing, most vendors feel that it is too complex for IP telephony, which is possible by using much simpler protocols. Still, H.323 is considered a pioneering protocol for packet voice and video. The IETF's SIP (Session Initiation Protocol) is now seen as being more important for VoIP.

H.323 Multimedia Conferencing Standard

H.323 is part of a family of the ITU-T Recommendations that specify multimedia communications services such as real-time audio, video, and data over a variety of communication services, including multipoint links where multiple users participate in the same exchange (such as a videoconference). The ITU calls H.323 a recommendation for a "visual telephone system" that works over LANs. It does not provide quality of service controls, because it is packet-based, but QoS can be obtained by relying on other means as discussed under "QoS (Quality of Service)." The IETF's RTP is the transport protocol for this scheme.

An H.323 environment consists of H.323 terminals (phones, and telephony-enabled PCs), gateways to the public telephone network, gatekeepers (management functions), and multipoint control units. It also includes a set of additional protocols for encoding and decoding audio and video data, as well as protocols that define how it is packetized. There are also protocols the define call signaling and control.

H.323 gateways connect different systems and devices (e.g., IP-based H.323 terminals to the PSTN). Gateways perform the appropriate mapping of call signals and control protocols between systems. H.323 gatekeepers are systems that manage a group of H.323 terminals and gateways within a "zone." A zone can be thought of as a management area consisting of a group of related terminals, gateways, and multiuser conferencing devices. Gatekeepers provide address translation functions between H.323 addresses and IP addresses. They also provide supervisory functions (admitting or rejecting users), bandwidth allocation, and call signaling functions.

See "H.323 Multimedia Conferencing Standard" for more information.


ETSI (European Telecommunications Standards Institute) has established the TIPHON (Telecommunications and Internet Protocol Harmonization Over Networks) Working Group to ensure that users connected to IP-based networks can communicate with users in switched-circuit networks such as the PSTN, ISDN, and GSM, and vice versa. While ETSI is a European body, it cooperates with the ITU and IETF. ETSI developed the GSM standard.

TIPHON includes working groups that are identifying various aspects of implementing VoIP, including interoperability, charging/billing, security, call control procedures, information flows, protocols, and QoS. See "OSP (Open Settlement Protocol)," for example.

TIPHON relies on H.323 gatekeeper, gateway, and terminal specifications. The gateway is subdivided into a signaling gateway (mediates signals between IP and the switched circuit networks), a media gateway (connects two different networks and performs translations), and the media gateway controller (translates and maps signaling information among the different networks).

The media gateway controller controls the media gateway by creating, modifying, and deleting connections, as well as specifying media formats and inserting information into media streams. The media gatekeeper provides functions such as authentication, admission policy, call signaling, and accounting.

There are many issues related to end-user addressing, as pointed out in Bjarne Munch's "IP Telephony Signalling" paper (see the Ericsson Web site listed on the related entries page). These issues relate "user friendliness, routing mechanisms, number mobility, and service mobility." In the switched circuit network, users are identified by ITU-T E.164 telephone numbers. In the packet network, IP addresses (and various aliases) are used.

The ETSI TIPHON Web site is at

IETF IP Telephony Standards

The IETF model for VoIP centers around SIP (Session Initiation Protocol). SIP is a control protocol operating in the application layer for setting up, maintaining, and terminating voice and videoconferencing sessions. SIP uses text-based messages and operates in much the same way as the client/server protocol of the Web (i.e., HTTP). SIP provides an overall messaging system for all types of multimedia applications and works just as well for electronic commerce and collaborative computing applications.

In fact, your phone number in an IP-based phone system uses the same URL format as a Web site address or e-mail address. Transferring phone calls to other locations is similar to clicking a hyperlink to switch to a different Web page. In addition, instant messaging and presence protocols can help users create phone call buddy lists and assist in call connections. For example, presence protocols can help locate a person to call, no matter where they are connected to the Net. Presence protocols will be critical in locating and calling mobile Internet phone users.

SIP has gained favor throughout the packet voice community. It is highly adaptable and supports multivendor interoperability among devices. Tests have shown that SIP works well and provides much faster call setup times than H.323.

SIP's distinguishing factor is that it uses the "intelligence at the edge" model. SIP relies on endpoint devices to control packet-based telephony services. In other words, two endpoint SIP devices can set up their own call across the Internet without any devices in the network getting involved, although in practice other devices will be involved if QoS is required or the call must go through the PSTN.

Since the SIP model is based on intelligent edge devices, developers are free to create telephony services and applications that are not restricted by the old service models of the traditional telephone network. In fact, designing telephony applications is very similar to creating Web client/server applications. Developers familiar with HTTP, XML, and other Web development tools instantly recognize SIP. Other protocols that are implemented in Internet multimedia devices include RTP (Real-time Transport Protocol), SAP (Session Announcement Protocol), and SDP (Session Description Protocol). See "Multimedia" for more information.

Media Gateways and Controllers

A telephony gateway is a network element that provides conversion between the audio signals carried on telephone circuits and data packets carried over the Internet or over other packet networks. A media gateway essentially replicates the behavior of the PSTN on the Internet. The purpose of these gateways and their associated protocols is to interconnect the Internet packet world with the switched circuit telephony world of the PSTN.

Gateways have three functional elements as pictured in Figure V-5 and described, next. These elements may be combined into a single box or physically separated into multiple boxes. By separating the functions, a distributed model can be built in which a single media control device can control many distributed media gateways. This separation is referred to as the softswitch architecture. See "Softswitch."

  • SS7 signaling gateway    Provides an SS7 STP (signal transfer point) function involving the switched circuit network and a protocol conversion between the SS7 network and IP transport protocols. A signaling gateway basically repackages SS7 signals for IP or IP signals for SS7.

  • MGC (media gateway controller)    The MGC handles registration, management, and control functionality of resources in the media gateway. It performs protocol conversion between PSTN signaling protocols and IP telephony. It gathers information about IP and circuit flows and provides that information to billing and management systems.

  • MG (media gateway)    A basic device that terminates PSTN switched circuits (trunks and local loops) and converts from pulse code modulated information to packetized information, and vice versa. Also handles RTP media streams across the IP network.

As mentioned, a distributed model for media gateways separates the call-control and signaling planes from the transport layer media gateways, as pictured in Figure V-6. Separating these functions improves scalability. New features can be added to a few controller devices rather than to many media gateways. This model also promotes commodity devices that will bring down the cost of VoIP.

RFC 2705 (Media Gateway Control Protocol, October 1999) provides a general description of the media gateway/media gateway controller model. It describes an architecture in which the call control "intelligence" is outside the media gateways and handled by media gateway controllers (also called "call agents"). These elements synchronize with one another to send coherent commands to the media gateways under their control. A control protocol is used to control VoIP gateways from the external call agents. The MGCP model consists of endpoints and connections:

  • Endpoints are sources or "sinks" of data and could be physical or virtual. Examples are interfaces on a gateway that terminates a trunk connected to a PSTN switch (e.g., class 5, class 4, and so on), or that terminate an analog POTS connection to a phone, key system, or PBX. A gateway that terminates residential POTS lines (to phones) is called a residential gateway. Note that there are physical endpoints (hardware) and virtual endpoints (created in software).

  • A connection is an association between endpoints over which data is transmitted. Point-to-point and multipoint connections are possible. Connections may exist over IP networks, ATM networks, or internal connections such as TDM backplanes or gateway backplanes. For point-to-point connections, the endpoints of a connection could be in separate gateways or in the same gateway.

Media Control Protocols

Starting in 1998, engineers at Telcordia (formerly BellCore) developed SGCP (Simple Gateway Control Protocol), a master-slave protocol that replicated the functions of a class 5 PSTN switch. Then, Level 3 Communications developed IPDC (Internet Protocol Device Control) as an enhancement to SGCP. The IETF took the best features of SGCP and IPDC and created MGCP. MGCP supports both video and telephony and uses control messages based on simple ASCII encoding. It provides call control features for low-cost, nonintelligent endpoint telephony devices (e.g., dumb phones and gateways). Later, Lucent stepped in with MDCP (Media Device Control Protocol) and the ITU merged the best features of MGCP and MDCP into H.GCP.

The reason for all this development is that network architects are trying to iron out protocols for a future all-IP, next-generation, packet-switched network while maintaining compatibility with the older PSTN. Naturally, there are many opinions on how this should be done.

Media control protocols suport separate control components. Note that SIP is still used in this model as a communication protocol between media controllers. In fact, SIP could be used in place of media control protocols. SIP assumed intelligent end devices, and it may turn out to be the best protocol for maintaining the intelligent edge device model throughout the Internet.

A media control protocol is implemented as a set of transactions composed of commands. The call agent may send commands to gateways that create, modify, and delete connections, or that create notification requests and auditing commands. Gateways may send notification and restart commands to the call agent. All commands include text-based header information following by (optionally) text-based session description information.

Refer to RFC 2705 for additional information. RFC 2805 (Media Gateway Control Protocol Architecture and Requirements, April 2000) provides additional information about media gateways, media controllers, and control protocols.

In late 1998, the IETF developed Megaco based on the previous work and the intention to overcome some of the problems in MGCP. Megaco drafts and documents are available at the IETF Media Gateway Control (megaco) Web site at Also refer to H.248 documents at the ITU (

Megaco assumes that end devices are not intelligent devices (in contrast, SIP assumes that the network consists of distributed intelligent devices). It replaces the gatekeeper functions of the H.323 protocol; supports TDM, ATM, and connectionless IP; and works in a range of gateways, from residential gateways to carrier-size gateways.

In 1999, the IETF and the ITU formally agreed to work on a single protocol. This led to Megaco/H.248. The ITU has largely taken over this development as H.248. However, don't count on H.248 as the final development in this saga. Some feel that the ITU is making it more complex than it needs to be for IP telephony. See RFC 3015 (Megaco Protocol Version 1.0, November 2000) for more information.

Other IP Telephony Protocols

This section describes some of the related protocols and initiatives underway to support IP telephony. Most of the work is being done by the IETF.

IP-PBX (IP-Private Branch Exchange)

A traditional PBX is basically a telephone switch designed for small to large companies. Internet telephones are wired into the box, which then connects inside calls (internal phone to internal phone) or connects calls to/from the PSTN. Phones connect to the traditional PBX over analog phone lines. IP-PBXs use the data network for voice. The IP-PBX connects to the data network and network-connected IP telephones access the IP-PBX.

IP telephony devices are not restricted by the limitations of the old phone system, so end devices and IP-PBXs can provide enhanced functionality and services. These new services are immediately available inside the enterprise that upgrades to IP-PBXs. As packet telephony grows throughout the Internet, users will be able to connect with IP telephony users outside the corporate network and take advantages of new services. See "PBX (Private Branch Exchange)."

CPL (Call Processing Language)

CPL is a processing language that provides a simple and standardized way for users of Internet telephony devices to specify how they want incoming calls handled. CPL is described in RFC 2824 (Call Processing Language Framework and Requirements, May 2000). The IP Telephony (iptel) Working Group ( is developing CPL and related protocols.

The traditional phone system does not provide much control in this area, but a simple example would be a telephone that can be programmed by the user to forward calls to another location during a specific time period, or else just record a message. CPL is a simple language that provides a standardized way to describe these behaviors in telephony devices.

Signaling Transport and SCTP (Stream Control Transmission Protocol)

RFC 2719 (Framework Architecture for Signaling Transport, October 1999) defines an architectural framework for transport of message-based signaling protocols over IP networks. It encapsulates methods, end-to-end protocol mechanisms, and use of existing IP capabilities to support the functional and performance requirements for signaling transport.

SCTP is defined in RFC 2960 (Stream Control Transmission Protocol, October 2000). It is meant to be the "signal carrier" in place of TCP. SCTP is a reliable transport protocol operating on top of a connectionless packet network such as IP. The services it offers to users include many of the same services as TCP, but sequencing is removed to reduce delays. SCTP was developed by the IETF Signaling Transport (sigtran) Working Group, which is working on related technologies. See "SCTP (Stream Control Transmission Protocol)."

TRIP (Telephony Routing over IP)

TRIP defines a policy-driven interadministrative domain framework for advertising the reachability of telephony destinations between location servers, and for advertising attributes of the routes to those destinations. TRIP can serve as the telephony routing framework for any signaling protocol. TRIP is similar to BGP-4 in the way that it distributes routing information (telephony information in TRIP's case) between administrative domains. RFC 2871 (A Framework for Telephony Routing over IP, June 2000) defines the framework for TRIP. The work on TRIP is being done by the IP Telephony (iptel) Working Group.

OSP (Open Settlement Protocol)

The purpose of the Open Settlement Protocol is to handle settlements for IP telephony calls among the different carriers that route calls, giving IP telephony the push it needs to expand on a global scale. The protocol supports call authentication, authorization, and accounting techniques, and provides per-minute settlements for VoIP calling. OSP is defined in the ETSI TIPHON project. See "OSP (Open Settlement Protocol)."

JAIN (Java Advanced Intelligent Network)

JAIN is a set of Java technology-based APIs that enable the rapid development of new telecom products and services on the Java platform. JAIN is a Sun Microsystem's initiative that can be used to define and build advanced telecom services that blend IN (Intelligent Network) and Internet technologies. The Java language serves as the basis for creating middleware components and tools for SS7 applications. JAIN allows different IN services to run across a variety of switching platforms from many different vendors. Information is available at The Parlay Group is also involved with JAIN. Its Web site is

IETF Working Groups and RFCs

The following IETF working groups are developing IP telephony protocols or are working on related technologies. Additional working groups that are developing QoS are listed under the topic "QoS (Quality of Service)."

The Audio/Video Transport (avt)

IP Telephony (iptel)

Media Gateway Control (megaco)

Multiparty Multimedia Session Control (mmusic)

PSTN and Internet Internetworking (pint)

Service in the PSTN/IN Requesting InTernet Service (spirits)

Signaling Transport (sigtran)

Telephone Number Mapping (enum)

Internet Fax (fax)

Instant Messaging and Presence Protocol (impp)

The following Internet RFCs provide more information about voice over IP or related multimedia protocols. Also refer the "Multimedia" for other RFCs.

  • RFC 2458 (Toward the PSTN/Internet Inter-Networking-Pre-PINT Implementations, November 1998)

  • RFC 2705 (Media Gateway Control Protocol, October 1999)

  • RFC 2719 (Framework Architecture for Signaling Transport, October 1999

  • RFC 2805 (Media Gateway Control Protocol Architecture and Requirements, April 2000)

  • RFC 2806 (URLs for Telephone Calls, April 2000)

  • RFC 2824 (Call Processing Language Framework and Requirements, May 2000)

  • RFC 2833 (RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals, May 2000)

  • RFC 2848 (The PINT Service Protocol: Extensions to SIP and SDP for IP Access to Telephone Call Services, June 2000)

  • RFC 2871 (A Framework for Telephony Routing over IP, June 2000)

  • RFC 2885 (Megaco Protocol, August 2000)

  • RFC 2886 (Megaco Errata, August 2000)

  • RFC 2897 (Proposal for an MGCP Advanced Audio Package, August 2000)

  • RFC 2974 (Session Announcement Protocol, October 2000)

  • RFC 2960 (Stream Control Transmission Protocol, October 2000)

Copyright (c) 2001 Tom Sheldon and Big Sur Multimedia.
All rights reserved under Pan American and International copyright conventions.