Site home page
Get alerts when Linktionary is updated
Book updates and addendums
Get info about the Encyclopedia of Networking and Telecommunicatons, 3rd edition (2001)
Download the electronic version of the Encyclopedia of Networking, 2nd edition (1996). It's free!
Contribute to this site
Electronic licensing info
Note: Many topics at this site are reduced versions of the text in "The Encyclopedia of Networking and Telecommunications." Search results will not be as extensive as a search of the book's CD-ROM.
Data compression squeezes data so it requires less disk space for storage and less bandwidth on a data transmission channel. Communications equipment like modems, bridges, and routers use compression schemes to improve throughput over standard phone lines or leased lines. Compression is also used to compress voice telephone calls transmitted over leased lines so that more calls can be placed on those lines. In addition, compression is essential for videoconferencing applications that run over data networks.
Most compression schemes take advantage of the fact that data contains a lot of repetition. For example, alphanumeric characters are normally represented by a 7-bit ASCII code, but a compression scheme can use a 3-bit code to represent the eight most common letters.
In addition, long stretches of "nothing" can be replaced by a value that indicates how much "nothing" there is. For example, silence in a compressed audio recording can be replaced by a value that indicates how long that silence is. White space in a compressed graphic image can be replaced by a value that indicates the amount of white space.
Compression has become critical in the move to combine voice and data networks. Compression techniques have been developed that reduce the data requirements for a voice channel down to 8 Kbits/sec. This is a significant improvement over noncompressed voice (64 Kbits/sec) and older compression techniques yielding 32 Kbits/sec.
Two important compression concepts are lossy and lossless compression:
The removal of information in the lossy technique is acceptable for images, because the loss of information is usually imperceptible to the human eye. While this trick works on humans, you may not be able to use lossy images in some situations, such as when scanners are used to locate details in images.
Lossy compression can provide compression ratios of 100:1 to 200:1, depending on the type of information being compressed. Lossless compression ratios usually only achieve a 2:1 compression ratio. Lossy compression techniques are often "tunable" in that you can turn the compression up to improve throughput, but at a loss in quality. Compression can also be turned downed to the point at which there is little loss of image, but throughput will be affected.
Basic Compression Techniques
The most basic compression techniques are described here:
In the Lempel-Ziv data-compression algorithm, all single character strings occupy the table. As new strings appear, a tree structure is created, similar to Figure 1, which shows the "T" branch of the tree. Note that a three-character word can be deduced by following any branch of the tree. Each branch of the tree is identified by a codeword and the codeword is sent in any transmissions. If a new string appears, nodes are added to an appropriate branch of the tree and a new codeword is generated to represent it.
Figure 1: Lempel-Ziv data compression
A popular Lempel-Ziv product is Hi/fn Inc.'s LZS. Hi/fn is a spin-off of Stac Inc., an early developer of compression software that once won a compression patent battle with Microsoft. LZS has become in important industry standard that is widely used for many compression applications, including network connection gear. A public domain compression algorithm similar to Lempel-Ziv is Predictor, which tries to predict sequences of characters in advanced by looking up sequences in a dictionary. Some vendors like Cisco use both LZS and Predictor in their products.
Because compression algorithms are software-based, overhead exists that can cause problems in real-time environments. Compression is processor intensive, so for real-time data transmissions like network links, you will need a system on both ends of the link that can compress and decompress data without causing appreciable delays. Some applications may not tolerate any delays, in which case you may need to tune the compression levels and/or boost the processing power to remove those delays.
Another consideration is that compression affects the portability of files. You will need to make sure that all recipients have the software needed to decompress files.
Note that some files are already compressed to begin with and don't benefit from any further external compression techniques. Some graphics file formats, such as TIFF (Tagged Image File Format), are automatically compressed when they are stored.
Tip: One of the best sources for information about compression and codecs (coder/decoders) is at Codec Central (http://www.terran.com/CodecCentral/index.html).
Storage System Compression
Before discussing compression algorithms for file storage, know that hard drives store information using data-encoding techniques that optimize the way bits are stored in the magnetic flux transition states. This encoding is built into the drive and used to store all data on that drive. This encoding provides some compression of data, but it must be discussed separately from the application-level compression that you can choose to apply to individual data files before they are sent to the drive for storage.
A magnetic recording system such as a hard drive records information by changing a magnetic field over the disk surface. A change in the field between two possible states is called a flux transition. In a very basic system using MFM (modified frequency modulation), a digital 1 might be represented by the presence of a flux transition while a digital 0 might be represented by the absence of a transition. The following encoding techniques improve the efficiency of this scheme:
Because disk encoding is automatically handled by the disk drive at the hardware level, it is of no further importance to this discussion. When you purchase a disk drive, the encoding scheme is embedded and the capacity of the drive reflects that encoding scheme.
File compression is handled in several ways. Various utilities are available that let you compress files one at a time or as a group. Groups of files can be compressed into a single file that is much easier to send to another user. A decompression utility unpacks the files. Nearly universal file compression utilities are PKZip and WINZip from Niko Mak Computing. A file or a group of files can be compressed into a zip file (called an archive if multiple files are compressed into it) that can be decompressed by anybody else that has the utility, or you can create self-extracting files that can be opened by anyone who does not have the utility. Another similar utility is ZipMagic from Mijenix.
Most operating systems, including DOS, NetWare, Windows NT, and others, now include disk compression software. In the case of NetWare 4.x, you can enable automatic compression for specific files, or all files that reside on a volume or in a specific directory. Special file attributes can be set to flag the files you want the system to automatically compress when not in use. Be careful when enabling automatic compression systems. Some applications may not work properly with files in a compressed state.
Some file encryption utilities also include compression routines so you can make files private and compress files in the same steps.
There are a number of compression techniques designed to store graphic images in a much smaller space. Image compression also improves throughput when images files are transmitted from one place to another. Several compression techniques are discussed below.
JPEG (Joint Photographic Experts Group) compression is an ITU and ISO standardized method for compressing still images using DCT, which converts three-dimensional color and coordinate image information into a format that is more responsive to compression. Color information is also encoded and some is discarded, depending on the desired end-results resolution. Compression can be lossless or lossy. The resulting information is then compressed using RLE, a special technique that compresses similar regions.
An excellent discussion of fractal and wavelet technology may be found on the technical page at Iterated Systems' Web site (http://www.iterated.com/tech/). The company's latest product called STiNG uses JPEG, fractal, and wavelet compression to provide a variety of resolutions.
A company called Compression Engines claims that its wavelet algorithms can create a file that is 300 times smaller than the original image but still has the same high resolution as the original.
A company called LuraTech uses wavelet technology in a unique way to compress documents that contain both text and images. Its LuraDocument product analyzes segments of a document' in a binary image containing text, a foreground image preserving the text color, and a background image with the text removed. These separate images are then individually compressed using the most appropriate and efficient techniques. The company is also developing a special ASIC chip that uses wavelet-based image compressor technology to compress still images losslessly with a data rate of ten million pixels per second, suitable for real-time video compression.
In video compression, each frame is an array of pixels that must be reduced by removing redundant information. Video compression is usually done with special integrated circuits, rather than with software, to gain performance. Standard video is normally about 30 frames/sec, but studies have found that 16 frames/sec is acceptable to many viewers, so frame dropping provides another form of compression.
When information is removed out of a single frame, it is called intraframe or spatial compression. But video contains a lot of redundant interframe information such as the background around a talking head in a news clip. Interframe compression works by first establishing a key frame that represents all the frames with similar information, and then recording only the changes that occur in each frame. The key frame is called the "I" frame and the subsequent frames that contain only "difference" information are referred to as "P" (predictive) frames. A "B" (bidirectional) frame is used when new information begins to appear in frames and contains information from previous frames and forward frames.
One thing to keep in mind is that interframe compression provides high levels of compression but is difficult to edit because frame information is dispersed. Intraframe compression contains more information per frame and is easier to edit. Freeze frames during playback also have higher resolution.
A number of video codecs are available that work in either hardware or software. You can find additional codec information at the Codec Central site.
Other compression methods are on the way, and many existing methods are being revised. ITU committees are working on standards for video phones and videoconferencing over ISDN (Integrated Services Digital Network) and other services. Refer to "Videoconferencing" for more information.
The original telephone system was based on an analog infrastructure. But analog transmissions are subject to transmission problems like attenuation over distance and line noise. The more attenuation, the more the line noise masks the real signal. Periodic amplification solves the problem to some extent, but any line noise introduced is also amplified with the voice signal. To solve this problem, the trunk lines of the telephone system were converted to digital and PCM (pulse code modulation) was used to digitize the signals. Later, ADPCM (Adaptive Differential Pulse Code Modulation) was used to compress the signal.
The basic PCM technique is to use a codec (coder/decoder) to sample the amplitude of a voice signal 8,000 times per second, and then store the amplitude value as 8 bits of data. The formula is
8000 samples/second × 8 bits/sample = 64,000 bits/sec
Note: Refer to "ADC (Analog-to-Digital Conversion)" for more information and an illustration of this process.
That result may look familiar. It is the fractional component of a T1 line, which provides 24 channels of 64 Kbits/sec for a total bandwidth of 1.544 Mbits/sec. This 64-Kbit/sec rate is the basis for the entire phone systems digital hierarchy. A T3 trunk consists of 28 T1 lines or 672 voice channels.
ADPCM uses a special encoding technique that reduces the data required to store each sample, transmitting only the difference between one sample and the next. An adaptive predictor algorithm is used to predict in advance how the signal will change. The prediction is usually very accurate. As samples vary, the predictor rapidly adapts to the changes. What ADPCM provides is 48 voice channels on a T1 line, which benefits customers that use such lines to interconnect their remote offices or connect their internal phone systems to the phone company switches.
Other compression techniques have been devised. One method called CVSD (continuously variable slope delta) increases the sampling rate up to 32,000 times per second. At this rate, the difference between each sample is small enough that a single bit can represent a change in the direction of the amplitude. The sampling rate is variable. At a sampling rate of 32,000 samples/sec, 48 channels are possible, but by reducing the sampling rate, more channels can be obtained (24,000 samples/sec yields 64 channels, 16,000 samples/sec yields 96 channels), but the quality is reduced.
Internet telephony and the packaging of digitized voice into IP packets is the current trend. Lack of bandwidth will affect quality until more bandwidth becomes available or QoS techniques are put into place.
There are currently many different coding techniques, some of which improve bandwidth, voice quality, and other factors. Table 1 lists the most popular coding techniques that have been standardized for telephony and voice packets by the ITU-T in its G-series recommendations.
Table 1: Popular ITU-T Series G Recommendations (Source: Cisco Systems)
Additional information is available in a paper called "Internet Telephony and Voice Compression" by Kelly Ann Smith and Daniel Brushteyn at University of Pennsylvania. The link is listed on the related entries page.
Compression for Data Communication
Compression provides a way to improve throughput on wide area links such as LAN interconnections with modems, bridges, and routers. However, compression must be done in real time, which imposes restrictions on the compression algorithm. One of those restrictions is a requirement for lossless compression. No bits may be lost.
Picture two remote LANs connected by a pair of modems, bridges, or routers. Compression is done by the link devices or by computers at each end of the link. As data streams into the compressor at one end of the link, that data must be compressed on the fly and transmitted with little or no delay (depending on what is acceptable). The receiver must decompress the data at the same rate to prevent delays. The data that is flowing into the link devices is in frame or packet form.
In contrast, compressing and saving a file to a disk on a desktop computer seems like a simple task given the relative amount of time that the compression can take place.
Modem manufacturers have used many of the data-compression techniques mentioned earlier, but a variation of the Lempel-Ziv technique has become popular with the acceptance of the ITU V.42 bis data-compression standard. V.42 is able to monitor data and determine whether compression improves data throughput. For example, files that are already compressed or encrypted do not benefit from further compression.
Note: Encrypted data, by definition, should not have repeating patterns as that may expose the data to attack. Compressing encrypted data actually causes it to expand since there are no patterns that benefit compression. Likewise, there is usually little reason to compress data that has already been compressed since expansion may occur.
In some cases, using software-based compression is preferred. For example, Microsoft recommends turning off hardware compression on the modem and using its own software compression for RAS (Remote Access Server) dial-up connections, It claims its RAS data compression can be two to four times as efficient as modem compression.
There are other modem compression protocols, such as the MNP (Microcom Networking Protocol) class 5 and class 7 series, but the V.42 standard has taken off as the compression method included with most modems. A protocol called CCP (Compression Control Protocol) provides a way for two linked systems to negotiate the type of data compression over a PPP (Point-to-Point Protocol) connection.
Point-to-point leased lines and ISDN connections provide wide area connectivity for many organizations. Compression is the best way to optimize the use of these lines. Two methods may be used to compress data across links:
Note that stream mode is designed for point-to-point links because the whole data stream is compressed, including packet headers. Since the headers are compressed, the packets cannot go through routers unless the routers decompress the packets, and this adds delays and other problems. Packet mode is best when transporting packets across mesh, hop-to-hop networks.
An option available in packet mode is whether to compress data before or after putting it in the packet. If you compress the data before putting in the packet, there are fewer packets to send. If you compress after putting it in the packet, you retain the same number of packets. The former is the preferred method on networks that charge by the packet. If encryption is required, data should be compressed and then encrypted before placing it in the packet due to the problem of trying to encrypt compressed data as mentioned in the previous note.
The packet mode method can be used over public networks such as the Internet. If the sending and receiving router also perform encryption on the packet data, the packet transmission forms a VPN (virtual private network) across the Internet.
RFC 1144 (Compressing TCP/IP Headers for Low-Speed Serial Links. February 1990) describes how to compress the headers of TCP/IP packets before they cross the link. These headers are relatively large, so compression can improve throughput. Header compression is common for PPP connections. For example, when configuring a Windows dial-up configuration to use PPP, you can choose IP header compression. In some configurations, it is called Van Jacobson or VJ header compression after the author of the aforementioned RFC. Note that some ISPs may not support IP header compression (if you are configuring a PPP Internet connection).
Other significant compression protocols or standards work is outlined below.
The following Internet RFCs provide information about compression and its use in TCP/IP networks:
Copyright (c) 2001 Tom Sheldon and Big Sur Multimedia.