Congestion Control Mechanisms (Linktionary term)

Site home page
(news and notices)

Get alerts when Linktionary is updated

Book updates and addendums

Get info about the Encyclopedia of Networking and Telecommunicatons, 3rd edition (2001)

Download the electronic version of the Encyclopedia of Networking, 2nd edition (1996). It's free!

Contribute to this site

Electronic licensing info

Congestion Control Mechanisms

Note: Many topics at this site are reduced versions of the text in "The Encyclopedia of Networking and Telecommunications." Search results will not be as extensive as a search of the book's CD-ROM.

Note: This topic is presented in its entirety as a sample topic.

Congestion is a problem that occurs on shared networks when multiple users contend for access to the same resources (bandwidth, buffers, and queues). Think about freeway congestion. Many vehicles enter the freeway without regard for impending or existing congestion. As more vehicles enter the freeway, congestion gets worse. Eventually, the on-ramps may back up, preventing vehicles from getting on at all.

In packet-switched networks, packets move in and out of the buffers and queues of switching devices as they traverse the network. In fact, a packet-switched network is often referred to as a "network of queues." A characteristic of packet-switched networks is that packets may arrive in bursts from one or more sources. Buffers help routers absorb bursts until they can catch up. If traffic is excessive, buffers fill up and new incoming packets are dropped. Increasing the size of the buffers is not a solution, because excessive buffer size can lead to excessive delay.

Congestion typically occurs where multiple links feed into a single link, such as where internal LANs are connected to WAN links. Congestion also occurs at routers in core networks where nodes are subjected to more traffic than they are designed to handle. TCP/IP networks such as the Internet are especially susceptible to congestion because of their basic connection- less nature. There are no virtual circuits with guaranteed bandwidth. Packets are injected by any host at any time, and those packets are variable in size, which make predicting traffic patterns and providing guaranteed service impossible. While connectionless networks have advantages, quality of service is not one of them.

Shared LANs such as Ethernet have their own congestion control mechanisms in the form of access controls that prevent multiple nodes from transmitting at the same time. See "Access Methods" and "MAC (Media Access Control)."

The following basic techniques may be used to manage congestion.

End-system flow control This is not a congestion control scheme, but a way to prevent the sender from overrunning the buffers of the receiver. See "Flow-Control Mechanisms."

Network congestion control In this scheme, end systems throttle back in order to avoid congesting the network. The mechanism is similar to end-to-end flow controls, but the intention is to reduce congestion in the network, not the receiver.

Network-based congestion avoidance In this scheme, a router detects that congestion may occur and attempts to slow down senders before queues become full.

Resource allocation This technique involves scheduling the use of physical circuits or other resources, perhaps for a specific time period. A virtual circuit, built across a series a switches with a guaranteed bandwidth is a form of resource allocation. This technique is difficult, but can eliminate network congestion by blocking traffic that is in excess of the network capacity. A list of related topics may be found on the related entries page.

Caching is probably the ultimate congestion control scheme. By moving content closer to users, a majority of traffic is obtained locally rather than being obtained from distant servers along routed paths that may experience congestion. Caching has become a serious business on the Internet, as discussed under "Content Distribution."

Queuing and Congestion

Any discussion of congestion naturally involves queuing. Buffers on network devices are managed with various queuing techniques. Properly managed queues can minimize dropped packets and network congestion, as well as improve network performance.

The most basic technique is FIFO (first-in, first-out), where packets are processed in the order in which they arrive in the queue. Going beyond this, a priority queuing scheme uses multiple queues with different priority levels so that the most important packets are sent first.

An important queuing technique is to assign flows to their own queues. This differentiates flows so that priorities can be assigned. Just as important, each flow is responsible for making sure that it does not overflow its own queue. Separating queues in this way ensures that each queue only contains packets from a single source.

See "Queuing" for more information.

Congestion Control in Frame Relay

While this topic is primarily about congestion problems in connectionless packet-switched networks, it is useful to examine the way congestion is handled in a connection-oriented network. Frame relay provides a good example.

Frame relay subscribers negotiate a CIR (committed information rate) with the service provider. The CIR is the guaranteed level of service, but providers usually allow subscribers to burst over this level if network capacity is available. However, frames in excess of the CIR are marked as discard eligible. If a switch on the network becomes congested, it will drop discard eligible frames. This ensures that the service providers can meet their negotiated CIR levels for subscribers.

Dropping frames is never a good idea, so two congestion avoidance mechanisms are available:

BECN (backward explicit congestion notification) When a switch starts to experience congestion (i.e., the buffers/queues are getting full), it can send a frame in the backward direction to senders with the BECN bit set to inform senders to slow down.

FECN (forward explicit congestion notification) When a switch starts congesting, it can send a frame in the forward direction to receiving nodes with the FECN bit set. This informs the forward nodes that they should inform the sender to slow down.

Note that sender or receiver do not need to respond to BECN or FECN, but eventually, network switches will drop frames as congestion continues to increase.

Congestion Control and Avoidance in TCP

Until the mid 1980s, the Internet was prone to a phenomenon called "congestion collapse." This would occur because there was little control over managing heavy network loads. Individual connections used flow controls between sender and receiver to prevent the sender from overwhelming the receiver. These are described in RFC 793 (Transmission Control Protocol, September 1981).

But these early flow controls were designed to prevent overflowing the receiver's buffers, not the buffers of network nodes. However, the early Internet consisted of a large number of relatively slow links, so congestion was not the problem it is today.

In the late 1980s, Van Jacobson developed the congestion control mechanisms that make TCP respond to congestion in the network. The basic "signal" is a dropped packet, which causes the host to stop or slow down.

Normally, when a host receives a packet (or set of packets), it sends an ACK (acknowledgement) to the sender. A window mechanism allows the host to send multiple packets with a single ACK as discussed under "Flow-Control Mechanisms." Failure to receive an ACK indicates that the receiving host may be overflowing or that the network is congested. In either case, the sender slows down or stops.

A strategy called additive increase/multiplicative decrease regulates the number of packets that are sent at one time. If you graphed the flow, you would see a sawtooth pattern where the number of packets increases (additive increase) until congestion occurs and then drops off when packets start to drop (multiplicative decrease). The window size is typically halved when a congestion signal occurs.

What the host is doing is finding the optimal transmission rate by constantly testing the network with a higher rate. Sometimes, the higher rate is allowed, but if the network is busy, packets start to drop and the host scales back. This scheme sees the network as a "black box" that drops packets when it is congested. Therefore, congestion controls are run by the end systems that see dropped packets as the only indication of network congestion.

A sender that is transferring a large file will push for a higher rate until eventually it grabs all the bandwidth. Other hosts may have trouble getting packets through. Often, the host that has grabbed the bandwidth is transmitting the least important traffic. The effect is especially disruptive to real-time traffic such as voice. Even if a link has sufficient bandwidth to handle its load, ill-behaved hosts can saturate the link (although briefly) enough to disrupt voice traffic in a way that is perceptible to users.

Of course, the network can take an active role in managing congestion. That is where "active queue management" and congestion avoidance come into play, as discussed later. RFC 1254 (Gateway Congestion Control Survey, August 1991) describes congestion control and avoidance mechanisms that were reviewed by the IETF Performance and Congestion Control Working Group. The group divided congestion handling into the following:

Congestion recovery Restore the operating state of the network when demand exceeds capacity.

Congestion avoidance Anticipate congestion and avoid it so that congestion never occurs.

RFC 1254 states that the Internet would cease to operate without congestion recovery, but has operated a long time without congestion avoidance. Today, congestion avoidance is an important tool for improving the performance and QoS of the Internet.

RFC 2309 (Recommendations on Queue Management and Congestion Avoidance in the Internet, April 1998) states that router-based mechanisms for controlling congestion can be divided into "queue management" algorithms and "scheduling" algorithms. Refer to this RFC for useful information about congestion controls, congestion avoidance schemes, and queue scheduling techniques.

An important goal is to minimize the number of dropped packets. If a host is transmitting at a high rate and the network go into congestion, a large number of packets will be lost. Congestion avoidance attempts to prevent this without putting limits on network throughput. RFC 2309 points out that it is better to accept the fact that there will be bursts that overflow queues rather than try to maintain queues in a non-full state. That would essentially translate to favoring low end-to-end delay over high throughput. The RFC also notes the following:

The point of buffering in the network is to absorb data bursts and to transmit them during the (hopefully) ensuing bursts of silence. This is essential to permit the transmission of bursty data. It should be clear why we would like to have normally-small queues in routers: we want to have queue capacity to absorb the bursts. The counter-intuitive result is that maintaining normally-small queues can result in higher throughput as well as lower end-to-end delay. In short, queue limits should not reflect the steady state queues we want maintained in the network; instead, they should reflect the size of bursts we need to absorb.

Keep in mind that bursts can disrupts multiple hosts. If a single hosts fills a queue being used by multiple hosts, all of the hosts will need to back off. This results in a period in which the network is underutilized because hosts are sending packets at a lower rate. But eventually, they start building back up with a need to retransmit dropped packets. What happens then is that all the hosts that previously backed off try to resend at about the same time, causing another congestion state. This is called the "global synchronization" problem.

Keep in mind is that TCP handles congestion control. UDP is typically used for real-time audio and video streams because there is no need to recover lost packets. UDP is an unreliable transport protocol that does not send ACK signals back to the source. Since there is no ACK, UDP streams cannot be controlled with traditional TCP congestion controls.

Lawrence G. Roberts, one of the early architects of the Internet, made some interesting comments about TCP and its congestion control schemes in a 1997 paper called "Explicit Rate Flow Control, A 100 Fold Improvement over TCP." His comments are paraphrased below.

So long as TCP operates only in the end-stations its operation cannot be substantially improved . . . If TCP is not replaced, TCP will cause major overloads and outages on long haul networks like the Internet. Users are severely impacted by the slow start-up rate and high delay variance inherent with TCP. The IETF has not even considered revising TCP. In fact no study has been done by the IETF on flow control because everyone seems to believe, "if it worked in the past, it will continue to work". TCP must be replaced with a new flow control as good as explicit rate flow control as soon as possible.

RFC 2581 (TCP Congestion Control, April 1999) defines TCP's four intertwined congestion control algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery. Each of these is discussed in the following sections.

Slow Start Congestion Control

Slow start reduces the burst affect when a host first transmits. It requires a host to start its transmissions slowly and then build up to the point where congestion starts to occur. The host does not initially know how many packets it can send, so it uses slow start as a way to gauge the network's capacity. A host starts a transmission by sending two packets to the receiver. When the receiver receives the segments, it returns ACKs (acknowledgements) as confirmation. The sender increments its window by two and sends four packets. This buildup continues with the sender doubling the number of packets it sends until an ACK is not received, indicating that the flow has reached the network's ability to handle traffic or the receivers ability to handle incoming traffic.

Slow start does not prevent congestion, it simply prevents a host from causing an immediate congestion state. If the host is sending a large file, it will eventually reach a state where it overloads the network and packets begin to drop. Slow start is critical in avoiding the congestion collapse problem. But new applications such as voice over IP cannot tolerate the delay caused by slow start and in some cases, slow start is disabled so the user can "grab" bandwidth. That trend will only lead to problems.

Fast Retransmit and Fast Recovery (Reno)

Fast retransmit and fast recovery are algorithms that are designed to minimize the effect that dropping packets has on network throughput. The fast retransmit mechanism infers information from another TCP mechanism that a receiver uses to signal to the sender that it has received packets out of sequence. The technique is to send several duplicate ACKs to the sender.

Fast retransmit takes advantage of this feature by assuming that duplicate ACKs indicates dropped packets. Instead of waiting for an ACK until the timer expires, the source resends packets if three such duplicate ACKs are received. This occurs before the timeout period and thus improves network throughput. For example, if a host receives packet 5 and 7, but not 6, it will send a duplicate ACK for packet 5 when it receives packet 7 (but not packet 6).

Fast recovery is a mechanism that replaces slow start when fast retransmit is used. Note that while duplicate ACKs indicate that a segment has been lost, it also indicates that packets are still flowing since the source received a packet with a sequence number higher than the missing packet. In this case, the assumption is that a single packet has been dropped and that the network is not fully congested. Therefore, the sender does not need to drop fully back to slow start mode but to half the previous rate.

Note that the preceding mechanisms are called Reno. RFC 2582 (The NewReno Modification to TCP's Fast Recovery Algorithm, April 1999) describes a modification to Reno to cover situations in which ACKs do not cover all of the outstanding data when loss was detected.

Active Queue Management and Congestion Avoidance

Dropping packets is inefficient. If a host is bursting and congestion occurs, a lot of packets will be lost. Therefore, it is useful to detect impending congestion conditions and actively manage congestion before it gets out of hand.

Active queue management is a technique in which routers actively drop packets from queues as a signal to senders that they should slow down. RFC 2309 lists the following advantages of active queue management:

Burst are inevitable. Keeping queue size small and actively managing queues improves a router's ability to absorb bursts without dropping excessive packets.

If a source overflows a shared queue, all the devices sharing that queue will slow down (the "global synchronization" problem).

Recovering from many dropped packets is more difficult than recovering from a single dropped packet.

Large queue can translate into delay. Active queue management allows queues to be smaller, which improves throughput.

Lock-out occurs when a host fills a queue and prevents other hosts from using the queue. Active queue management can prevent this condition.

Several congestion avoidance schemes are described next. Keep in mind that the next step beyond these techniques involves traffic shaping, resource reservations, virtual circuits, and QoS techniques. More on this later.

RED (Random Early Discard)

RED is an active queue management scheme that provides a mechanism for congestion avoidance. RFC 2309 , section 3 provides a description of RED. Sally Floyd and Van Jacobson wrote the basic paper describing RED gateways. That paper along with other papers and related resources may be found at Sally Floyd's Web site. The address is listed on the related entries page.

Unlike traditional congestion control schemes that drop packets at the end of full queues, RED uses statistical methods to drop packets in a "probabilistic" way before queues overflow. Dropping packets in this way slows a source down enough to keep the queue steady and reduces the number of packets that would be lost when a queue overflows and a host is transmitting at a high rate.

RED makes two important decisions. It decides when to drop packets and what packets to drop. RED keeps track of an average queue size and drops packets when the average queue size grows beyond a defined threshold. The average size is recalculated every time a new packet arrives at the queue. RED makes packet-drop decisions based on two parameters:

Minimum threshold (minth) Specifies the average queue size below which no packets will be dropped.

Maximum threshold (maxth) Specifies the average queue size above which all packets will be dropped

RED uses time-averaging, meaning that if the queue has recently been mostly empty, RED will not react to a sudden burst as if it were a major congestion event. However, if the queues remain near full, RED will assume congestion and start dropping packets at a higher rate.

RFC 2309 mentions that active queue management mechanisms in the Internet can have substantial performance benefits and that there appears to be no disadvantages to using the RED algorithm.

WRED (weighted RED) is a technique of dropping packets based on the type of traffic, where it is going, or other factors. WRED may also drop packets based on marking made to packets outside the network.

ECN (Explicit Congestion Notification)

The problem with RED is that it drops packets. A more efficient technique would be for a router to set a congestion notification bit in a packet, and then send the packet to the receiver. The receiver could then inform the sender to slow down via a message in the ACK. All the while, the receiver gets its packet and we avoid using packet drops to signal congestion.

ECN is an end-to-end congestion avoidance mechanism that adopts this technique. As the name implies, ECN provides direct notification of congestion rather than indirectly signaling congestion via dropped packets. ECN works when congestion is moderate. When congestion gets excessive, packet-drop techniques are used.

ECN-enabled routers set a CD (congestion experienced) bit in the packet header of packets from ECN-capable hosts when the length of a queue exceeds a certain threshold value. The packets are forwarded to the receiver, which then sends an ACK to the sender that contains the congestion indicator. This ACK is called an ECN-Echo. When the sender receives this explicit signal, it halves the rate at which it sends packets.

Note that ECN requires modifications to TCP. ECN is described in RFC 2481 (A Proposal to Add Explicit Congestion Notification to IP, January 1999). Refer to RFC 2884 (Performance Evaluation of Explicit Congestion Notification in IP Networks, July 2000) for further information. The IETF Endpoint Congestion Management (ecm) Working Group has additional information (Web site listed on the related entries page).

TCP Rate Control

TCP rate control is a technique in which endpoints can adjust their transmissions based on feedback from network devices that perform rate control. Packeteer is an advocate of rate control and this section describes how the company implements it in its PacketShaper products. Packeteer's Web site has numerous papers on rate control and other congestion control topics.

TCP Rate Control is also known as ERC (explicit rate control). A form of ERC is implemented in ATM networks. The Lawrence G. Roberts paper mentioned earlier in this section describes ERC in both ATM and TCP networks.

Packeteer's PacketShaper maintains state information about individual TCP connections. This allows it to send feedback to the source that controls its behavior. The primary goal is to control bursts by smoothing out a source's rate of transmission. With bursts reduced, traffic management becomes easier.

The rate control process is performed within the network between the end systems. PacketShaper intercepts ACKs from receivers and holds them for an amount of time that is precisely calculated to make the sender transmit its next packet in a way that smoothes out the burst.

For example, a source sends a packet to the receiver. The receiver returns an ACK to the sender. PacketShaper intercepts the ACK and changes some internal settings such as the TCP window size. At the precise moment, PacketShaper sends the packet and the contents of the packet (ACK sequence number plus the window size) and tells the sender that it is time to transmit another packet.

This results in a steady flow of packets from sources and improved resource management. The downside is that routers must actively track each flow. In addition, changing the contents of in-transit packets may not be a good idea, depending on the network. Traffic management and QoS devices implement TCP rate control.

Other Schemes

Most of the schemes described in the preceding section are queue management schemes. If you wish to manage network traffic beyond what these schemes provide, you need to look at prioritization schemes, packet tagging schemes, virtual circuit schemes, and QoS schemes. Refer to the following topics for more information:

Differentiated Services (Diff-Serv)

Integrated Services (Int-Serv)

Load Balancing

MPLS (Multiprotocol Label Switching)

Policy-Based Management

QoS (Quality of Service)

Traffic Management, Shaping, and Engineering

Congestion Management Resources

One of the best sources of information about congestion controls and especially congestion avoidance is Sally Floyd's Web site. The address is given on the related entries page. The IETF Endpoint Congestion Management (ecm) Working Group has information on new congestion control schemes and documents that evaluates current congestion control schemes. Sally Floyd wrote RFC 2914 (Congestion Control Principles, September 2000). The IETF Transport Area Working Group (tsvwg) is working on new transport specifications.

RFC 2581 and RFC 2914 are two of the best sources for information on TCP's congestion control. Some earlier or related RFCs of interest are listed here:

RFC 793 (Transmission Control Protocol, September 1981)

RFC 813 (Window and Acknowledgement Strategy in TCP, July 1982)

RFC 1122 (Requirements for Internet Hosts - Communication Layers, October 1989)

RFC 2018 (TCP Selective Acknowledgment Options, October 1996)

RFC 2414 (Increasing TCP's Initial Window, September 1998)

RFC 2416 (When TCP Starts Up With Four Packets Into Only Three Buffers. September 1998)

RFC 2488 (Enhancing TCP Over Satellite Channels Using Standard Mechanisms, January 1999)

RFC 2525 (Known TCP Implementation Problems, March 1999)

RFC 2757 (Long Thin Networks, January 2000)

RFC 2861 (TCP Congestion Window Validation, June 2000)