wiki2go: 2008

donderdag 21 augustus 2008

Poisson distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.
The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare. A classic example is the nuclear decay of atoms.

Occurrence

The Poisson distribution arises in connection with Poisson processes. It applies to various phenomena of discrete nature (that is, those that may happen 0, 1, 2, 3, ... times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. Examples of events that may be modelled as a Poisson distribution include:

* The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (1868–1931).
* The number of phone calls at a call center per minute.
* The number of times a web server is accessed per minute.
* The number of mutations in a given stretch of DNA after a certain amount of radiation.

The "law of small numbers"

The word law is sometimes used as a synonym of probability distribution, and convergence in law means convergence in distribution. Accordingly, the Poisson distribution is sometimes called the law of small numbers because it is the probability distribution of the number of occurrences of an event that happens rarely but has very many opportunities to happen.

The work focused on certain random variables N that count, among other things, a number of discrete occurrences (sometimes called "arrivals") that take place during a time-interval of given length. If the expected number of occurrences in this interval is λ, then the probability that there are exactly k occurrences (k being a non-negative integer, k = 0, 1, 2, ...) is equal to

f of k and lambda is the ratio of lambda to the power of k times e to the power of minus lambda over the factorial of k.

or again

f of k and lambda is lambda to the power of k time e to the power of minus lambda divided by the factorial of k.

where

* e is the base of the natural logarithm
* k is the number of occurrences of an event - the probability of which is given by the function
* k! is the factorial of k
* λ is a positive real number, equal to the expected number of occurrences that occur during the given interval. For instance, if the events occur on average 4 times per minute, and you are interested in the number of events occurring in a 10 minute interval, you would use as model a Poisson distribution with λ = 10*4 = 40.

M/M/1 model

The M/M/1 is a single-server queue model, that can be used to approximate a lot of simple systems.
M/M/1 schema

Following Kendall's notation it indicates a system where:

* Arrivals are a Poisson process;
* Service time is exponentially distributed;
* There is one server.

Analysis

If the arrival rate is λ and the service rate is μ we can represent the system by a birth-death process, where each state is the number of users. Of course we have an infinite number of states: state 0 (no users in the system), state 1 (1 user), state 2 (two users), and so on. The birth rate is every time λ and the death rate is μ. In fact, regardless of the state, we can have only two events:

* A new user arrives. So if the system is in state k, it goes to state k + 1 with rate λ
* A user leaves the system. So if the system is in state k, it goes to state k − 1 (or k if k is 0) with rate μ

It's easy now to see that the system is stable only if λ < μ. In fact if the death rate is less than the arrival rate, the number of users in the queue will become infinite.

If the system is stable, we can obtain a lot of performance measures of interest like:

* The expected number of users in the system
* The expected number of users in the queue
* The throughput

Example

There are a lot of situations where an M/M/1 model could be used. For instance, you can take a post office with only one employee, and therefore one queue. The customers arrive, go into the queue, they are served, and they leave the system. If the arrival process is Poisson, and the service time is exponential, we can use an M/M/1 model. Hence, we can easily calculate the expected number of people in the queue, the probabilities they will have to wait for a particular length of time, and so on.

woensdag 20 augustus 2008

Jackson network

The concept of Jackson networks, named after James R. Jackson, is the first significant development in the theory of networks of queues, in which each node of the queueing network can be analyzed separately.

Definition

A network of m interconnected queues is known as a Jackson network if it meets the following conditions:

1. Customers arriving from outside the system arrive as a Poisson process.
2. The servers each act as a Poisson process (exponentially distributed service times).
3. A customer leaving queue i will either move to some new queue j with probability Pij or leave the system with probability 1 minus the sum over P of i-j from j is 1 to m. Those events are independent and identically distributed.
4. The utilization of all of the queues is less than one.

In such a network, Jackson's theorem applies and the distribution of customers in each queue when the system is in equilibrium is exactly the distribution of an M/M/1 queueing model with the same utilization.

dinsdag 19 augustus 2008

Scheduling algorithm

In computer science, a scheduling algorithm is the method by which threads, processes or data flows are given access to system resources (e.g. processor time, communications bandwidth). This is usually done to load balance a system effectively or achieve a target Quality of service. The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and Multiplexing (transmit multiple flows simultaneously).

In computing and multitasking

The algorithm used may be as simple as round-robin in which each process is given equal time (for instance 1 ms, usually between 1 ms and 100 ms) in a cycling list. So, process A executes for 1 ms, then process B, then process C, then back to process A.

More advanced algorithms take into account process priority, or the importance of the process. This allows some processes to use more time than other processes. It should be noted that the kernel always uses whatever resources it needs to ensure proper functioning of the system, and so can be said to have infinite priority. In SMP systems, processor affinity is considered to increase overall system performance, even if it may cause a process itself to run more slowly. This generally improves performance by reducing cache thrashing.

In computer networks and multiplexing

In packet-switched computer networks and other statistical multiplexing, the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets. In advanced packet radio wireless networks such as HSDPA 3.5G cellular system, channel-dependent scheduling may be used to take advantage of favourable channel conditions to increase the throughput and system spectral efficiency. The simplest best-effort scheduling algorithms are round-robin, fair queuing (a max-min fair scheduling algorithm), proportionally fair scheduling and maximum throughput. If differentiated or guaranteed quality of service is offered, as opposed to best-effort communication, weighted fair queuing may be utilized.

In computer I/O

Determines the order in which disk I/O requests are pushed to the disk device.

maandag 18 augustus 2008

TCP Vegas revisited

In 1994, Brakmo, O'Malley & Peterson presented a new TCP implementation called Vegas that achievesbetween 40% and 70% better throughput and one-fifth to one-half the losses,when compare with TCP Reno. Vegas is an alternativeimplementation that interoperates with any other valid implementation of TCP, and all changes are confined to the sending side. This is asummary of their work.Vegas - New retransmission mechanism

In Reno, RTT and mean variance estimates arecomputed using a coarse-grained timer (around 500ms), meaning thatthe RTT estimate is not very accurate. This granularity influencealso how often TCP checks to see if it should time out on asegment. Under tests was found that for losses that resulted in a timeout, it took Reno an average of 1100ms from the timeit sent a segment that was lost, until it timed out and resent thesegment, whereas less that 300ms would have been the correcttimeout interval had a more accurate clock been used. Vegasthen fixes this problem using a finer coarse-grained timer.
Vegas also extends retransmission mechanism as follows: the system clock is read and recorded each time a segment is sent; when anACK arrives, the clock is read again and the RTT calculationis done using this time and the timestamp recorded for the relevantsegment. Using this more accurate RTT, retransmission is decided asfollows:
a.- When a dupack is received, Vegaschecks to see if the new RTT (current time - timestamprecorded) is greater than RTO. If it's, Vegas retransmits the segment without having to wait for the third dupack.In many cases third dupack is never received, and therefore, Reno would have to rely on the coarse-grained timeout to catch theloss.
b.- When a non-dupack is received, if it is the first orsecond one after a retransmission, Vegas checks again to see ifRTT > RTO; if so, then the segment is retransmitted. This willcatch any other segment that may have lost previous to theretransmission without having to wait for a dupack. Inother words, Vegas treats the receipt of certain ACKs asa trigger to check if a timeout should happen; but still contains Reno's coarse-grained timeout code in case these mechanism fail torecognize a lost segment.
Finally, in Reno, it is possible to decrease the congestionwindow more than once for losses ocurring during one RTT interval.In contrast, Vegas only decreases the congestion window if aretransmitted segment was sent after the last decrease. Any lossesthat happened before the last decrease are ignore, and therefore, do notimply that it should be decreased again. This change is needed because Vegas detects losses much sooner than Reno.

Vegas - New congestion avoidance mechanism

Reno's congestion detection uses theloss of segments as a signal of congestion. It has no mechanism to detectthe incipient stages of congestion - before losses occur - so theycan be prevented. Reno is reactive, rather than proactive, in thisrespect. Reno needs to create losses to find the availablebandwidth of the connection.

There are several proposed approaches for proactive congestion detection, for example:

Wang & Crowcroft's DUAL algorithm is based on the increase ofthe RTT; every 2-RTT delays the algorithm checks to see ifthe current RTT is greater than the average of the minimum andmaximum RTTs seen so far; if so, the congestion window isdecreased by one-eighth.
Jain's CARD adjusts the congestion window every 2-RTT delays based on the product (current cwnd-old cwnd) * (current RTT -old RTT); if the result is positive, decrease the window byone-eighth; if negative or zero, increase the window by one mss.This way the window will oscillate around its optimal point.
Wang & Crowcroft's Tri-S scheme increase the window size by onesegment every RTT and then compare the throughput achieved to thethroughput when the window was one segment smaller; if the difference isless than one-half the throughput achieved when only one segment was intransit - as was the case at the beginning of the connection - theydecrease the window by one segment. Tri-S calculates thethroughput by dividing the number of bytes outstanding in the network bythe RTT.

Vegas implementation uses the idea to measure and control the amount of extra data that a connection has intransit, where extra data means data that would not have been sent ifthe bandwidth used by the connection exactly matches tha available bandwidthof the link. Vegas's goal is to maintain the "right" amount of extradata in the network. Obviously, if a connection is sending too much extradata, it will cause congestion; if it's sending too little extra data, itcannot respond rapidly enough to transient increase in the availablebandwidth.

This algorithm is not in effect during slow-start and its implementation is as follows:

Define a given connection's BaseRTT to be the RTT of asegment when the connection is not congested; in practice it setsBaseRTT to the minimum of all measured RTTs; it is commonlythe RTT of the first segment sent by the connection, before therouter queues increase due to traffic generated by this connection. If weasume we are not overflowing the connection, the expected throughput isgiven by: Expected = WndowSize / BaseRTT, where WindowSizeis the size of the current congestion window, which we assume to be equalto the number of bytes outstanding.
Calculate the current Actual sending rate recording how manybytes are transmitted between the time that a segment is sent and its ack is received and its RTT, and dividing the number of bytestransmitted by the sample RTT. This calculation is done once perround-trip time.
Then compare Actual to Expected and adjust the windowaccordingly. Let Diff = Expected - Actual. Note that Diffis positive or zero by definition, since Actual > Expected impliesthat we have to change BaseRTT to the latest sample RTT.Also define two thresholds a and b, such that, a < b,roughly corresponding to having too little and too muchextra data in the network, respectively. When Diff < a, Vegasincreases the congestion window linearly during the next RTT, andwhen Diff > b, Vegas decrease the congestion window linearlyduring the next RTT. The congestion window is left unchangedwhen a < Diff < b.

Intuitively, the farther away the actualthroughput gets from the expected throughput, the more congestion there isin the network, which implies that the sending rate should be reduced. Theb threshold triggers this decrease. On the other hand, when theactual throughput rate gets too close to the expected throughput, theconnection is in danger of not utilizing the available bandwidth. The athreshold triggers this increase. The overall goal is to keep betweena and b extrabytes in the network.

Vegas - Modified slow-start mechanism

Reno's slow-start mechanism isvery expensive in terms of losses; since it doubles the size of thecongestion window every RTT while there are no losses - which isequivalent to doubling the attempted throughput every RTT -when it finally overruns the connection bandwidth, we can expect losses inthe order of half the current congestion window, more if we encounter aburst from another connection. Vegas tries to find a connection'savailable bandwidth that does not incur this kind of loss. Toward this end,the congestion detection mechanism is incorporated into slow-startwith only minor modifications.

To be able to detect and avoid congestionduring slow-start, Vegas allows exponential growth only every other RTT. In between, the congestion window stays fixedso a valid comparison of the expected and actual rates can be made. When theactual rate falls below the expected rate by a certain amount - the γ threshold - Vegas changesfrom slow-start mode to linear increase/decrease mode.

Discussion about Vegas TCP

Simulations show that if there are enoughbuffer in the routers - meaning that Vegas' congestion avoidancemechanisms can function effectively - a higher throughput and a fasterresponse time result. As the load increase and/or the number of routerbuffer decrease, Vegas' congestion avoidance mechanism are not aseffective, and Vegas starts to behave more like Reno. Underheavy congestion, Vegas behaves very similar to Reno,since Vegas fallback to Reno's coarse-grained timeoutmechanism.

The important point to keep in mind is that up to the point that congestion is bad enough for Vegas' behaviorto degenerate into Reno, Vegas is less aggressive inits use of router buffers than Reno. This is because Vegaslimits its use of router buffers as specified by the β threshold, whereas Renoincreases its window size until there are losses - which means allthe router buffers are being used.

Finally Vegas' congestion detectionalgorithm depends on the accurate value for BaseRTT. If ourestimate for the BaseRTT is too small, then the protocol's throughputwill stay below the available bandwidth; if it is too large, then it willoverrun the connection. Authors tell that their experience is that protocoldoes well with its current choice of BaseRTT. However, they plan tostudy this more carefully in the future.

woensdag 6 augustus 2008

Dynamic Host Configuration Protocol

From Wikipedia, the free encyclopedia

Dynamic Host Configuration Protocol (DHCP) is a protocol used by networked devices (clients) to obtain the parameters necessary for operation in an Internet Protocol network. This protocol reduces system administration workload, allowing devices to be added to the network with little or no manual configurations.

Applicability

Dynamic Host Configuration Protocol is a way to manage network parameter assignment from a single DHCP server, or a group of DHCP servers arranged in a fault-tolerant manner. Even in small networks, Dynamic Host Configuration Protocol is useful because it can make it easy to add new machines to the local network.

DHCP is also recommended even in the case of servers whose addresses rarely change, so that if a server needs to be readdressed (RFC2071), changes can be made in as few places as possible. For devices such as routers and firewalls, that should not use DHCP, it can be useful to put Trivial File Transfer Protocol (TFTP) or SSH servers on the same machine that runs DHCP, which also serves to centralize administration.

DHCP can be used to assign addresses directly to servers and desktop machines, and, through a Point-to-Point Protocol (PPP) proxy, to dialup and broadband on-demand hosts, as well as for residential Network address translation (NAT) gateways and routers. DHCP is generally not appropriate for infrastructure such as non-edge routers and DNS servers.

History
DHCP emerged as a standard protocol in October 1993 as defined in RFC 1531, succeeding the BOOTP protocol. The next update, RFC 2131 released in 1997 is the current DHCP definition. The latest proposed standard for DHCP over IPv6 (DHCPv6) can be found in RFC 3315.

Basic protocol operation
The Dynamic Host Configuration Protocol (DHCP) automates the assignment of IP addresses, subnet masks, default gateway, and other IP parameters. [1]

When a DHCP-configured client (be it a computer or any other network-aware device) connects to a network, the DHCP client sends a broadcast query requesting necessary information from a DHCP server. The DHCP server manages a pool of IP addresses and information about client configuration parameters such as the default gateway, the domain name, the DNS servers, other servers such as time servers, and so forth. Upon receipt of a valid request the server will assign the computer an IP address, a lease (the length of time for which the allocation is valid), and other IP configuration parameters, such as the subnet mask and the default gateway. The query is typically initiated immediately after booting and must be completed before the client can initiate IP-based communication with other hosts.

DHCP provides three modes for allocating IP addresses. The best-known mode is dynamic, in which the client is provided a "lease" on an IP address for a period of time. Depending on the stability of the network, this could range from hours (a wireless network at an airport) to months (for desktops in a wired lab). At any time before the lease expires, the DHCP client can request renewal of the lease on the current IP address. A properly-functioning client will use the renewal mechanism to maintain the same IP address throughout its connection to a single network, otherwise it may risk losing its lease while still connected, thus disrupting network connectivity while it renegotiates with the server for its original or a new IP address.

The two other modes for allocation of IP addresses are automatic (also known as DHCP Reservation), in which the address is permanently assigned to a client, and manual, in which the address is selected by the client (manually by the user or any other means) and the DHCP protocol messages are used to inform the server that the address has been allocated.

The automatic and manual methods are generally used when finer-grained control over IP address is required (typical of tight firewall setups), although typically a firewall will allow access to the range of IP addresses that can be dynamically allocated by the DHCP server.

The process of address allocation is known as ROSA. R-Request, O-Offer, S-Send, A-Accept.

Security
Having been standardized before network security became a significant issue, the basic DHCP protocol includes no security features, and is potentially vulnerable to two types of attacks:[2]

Unauthorized DHCP Servers: as you cannot specify the server you want, an unauthorized server can respond to client requests, sending client network configuration values that are beneficial to the attacker. As an example, a hacker can hijack the DHCP process to configure clients to use a malicious DNS server or router (see also DNS cache poisoning).
Unauthorized DHCP Clients: By masquerading as a legitimate client, an unauthorized client can gain access to network configuration and an IP address on a network it should otherwise not be allowed to use. Also, by flooding the DHCP server with requests for IP addresses, it is possible for an attacker to exhaust the pool of available IP addresses, disrupting normal network activity (a denial of service attack).
To combat these threats RFC 3118 ("Authentication for DHCP Messages") introduced authentication information into DHCP messages allowing clients and servers to reject information from invalid sources. Although support for this protocol is widespread, a large number of clients and servers still do not fully support authentication, thus forcing servers to support clients that do not support this feature. As a result, other security measures are usually implemented around the DHCP server (such as IPsec) to ensure that only authenticated clients and servers are granted access to the network.

Wherever possible, DHCP-assigned addresses should be dynamically linked to a secure DNS server, to allow troubleshooting by name rather than by a potentially unknown address. Effective DHCP-DNS linkage requires having a file of either MAC addresses or local names that will be sent to DNS that uniquely identifies physical hosts, IP addresses, and other parameters such as the default gateway, subnet mask, and IP addresses of DNS servers from a DHCP server. The DHCP server ensures that all IP addresses are unique, i.e., no IP address is assigned to a second client while the first client's assignment is valid (its lease has not expired). Thus IP address pool management is done by the server and not by a network administrator..

IP address allocation
Depending on implementation, the DHCP server may have three methods of allocating IP-addresses, plus a fourth mode of operation ("manual") in which the client (rather than the DHCP server) assigns an IP address. (WARNING--the terminology below differs from the terminology above in Basic Control Operation):

dynamic allocation
A network administrator assigns a range of IP addresses to DHCP, and each client computer on the LAN has its IP software configured to request an IP address from the DHCP server during network initialization. The request-and-grant process uses a lease concept with a controllable time period, allowing the DHCP server to reclaim (and then reallocate) IP addresses that are not renewed (dynamic re-use of IP addresses).
automatic allocation
The DHCP server permanently assigns a free IP address to a requesting client from the range defined by the administrator. This is like dynamic allocation, but the DHCP server keeps a table of past IP address assignments, so that it can preferentially assign to a client the same IP address that the client previously had.
static allocation
The DHCP server allocates an IP address based on a table with MAC address/IP address pairs, which are manually filled in (perhaps by a network administrator). Only requesting clients with a MAC address listed in this table will be allocated an IP address. This feature (which is not supported by all routers) is variously called "Static DHCP Assignment" (by DD-WRT), "fixed-address" (by the dhcpd documentation), "DHCP reservation" or "Static DHCP" (by Cisco/Linksys), and "IP reservation" or "MAC/IP binding" (by various other router manufacturers).
manual allocation: The DHCP server does not assign the IP address; instead, the client is configured with a user-specified static IP address.
Many DHCP servers can manage hosts by more than one of the above methods. For example, the known hosts on the network can be assigned an IP address based on their MAC address (static allocation) whereas "guest" computers (such as laptops via WiFi) are allocated a temporary IP address out of a pool compatible with the network to which they're attached (dynamic allocation).

DHCP and firewalls

Firewalls usually have to permit DHCP traffic explicitly. Specification of the DHCP client-server protocol describes several cases when packets must have the source address of 0x00000000 or the destination address of 0xffffffff. Anti-spoofing policy rules and tight inclusive firewalls often stop such packets. Multi-homed DHCP servers require special consideration and further complicated configuration.

To allow DHCP, network administrators need to allow several types of packets through the server-side firewall. All DHCP packets travel as UDP datagrams; all client-sent packets have source port 68 and destination port 67; all server-sent packets have source port 67 and destination port 68. For example, a server-side firewall should allow the following types of packets:

Incoming packets from 0.0.0.0 or dhcp-pool to dhcp-ip
Incoming packets from any address to 255.255.255.255
Outgoing packets from dhcp-ip to dhcp-pool or 255.255.255.255
where dhcp-ip represents any address configured on a DHCP server host and dhcp-pool stands for the pool from which a DHCP server assigns addresses to clients

Technical details

Schema of a typical DHCP sessionDHCP uses the same two IANA assigned ports as BOOTP: 67/udp for the server side, and 68/udp for the client side.

DHCP operations fall into four basic phases. These phases are IP discovery, IP lease offer, IP request, and IP lease acknowledgement.

After the client obtained an IP address, the client may start an address resolution (ARP) query to prevent IP conflicts caused by address pool overlapping of DHCP servers.

DHCP discovery
The client broadcasts on the physical subnet to find available servers. Network administrators can configure a local router to forward DHCP packets to a DHCP server on a different subnet. This client-implementation creates a UDP packet with the broadcast destination of 255.255.255.255 or subnet broadcast address.

A client can also request its last-known IP address (in the example below, 192.168.1.100). If the client is still in a network where this IP is valid, the server might grant the request. Otherwise, it depends whether the server is set up as authoritative or not. An authoritative server will deny the request, making the client ask for a new IP immediately. A non-authoritative server simply ignores the request, leading to an implementation-dependent timeout for the client to give up on the request and ask for a new IP address.

DHCP offers
When a DHCP server receives an IP lease request from a client, it extends an IP lease offer. This is done by reserving an IP address for the client and sending a DHCPOFFER message across the network to the client. This message contains the client's MAC address, followed by the IP address that the server is offering, the subnet mask, the lease duration, and the IP address of the DHCP server making the offer.

The server determines the configuration, based on the client's hardware address as specified in the CHADDR field. Here the server, 192.168.1.1, specifies the IP address in the YIADDR field.

DHCP acknowledgement
When the DHCP server receives the DHCPREQUEST message from the client, it initiates the final phase of the configuration process. This acknowledgement phase involves sending a DHCPACK packet to the client. This packet includes the lease duration and any other configuration information that the client might have requested. At this point, the IP configuration process is complete.

The server acknowledges the request and sends the acknowledgement to the client. The system as a whole expects the client to configure its network interface with the supplied options.

DHCP information
The client to the DHCP server: either to request more information than the server sent with the original DHCPACK; or to repeat data for a particular application - for example, browsers use DHCP Inform to obtain web proxy settings via WPAD. Such queries do not cause the DHCP server to refresh the IP expiry time in its database.

DHCP releasing
The client sends a request to the DHCP server to release the DHCP and the client unconfigures its IP address. As clients usually do not know when users may unplug them from the network, the protocol does not mandate the sending of DHCP Release.

Client configuration parameters
A DHCP server can provide optional configuration parameters to the client. RFC 2132 describes the available DHCP options defined by Internet Assigned Numbers Authority (IANA) - DHCP and BOOTP PARAMETERS.

Options
To identify the vendor and functionality of a DHCP client. The information is a variable-length string of characters or octets which has a meaning specified by the vendor of the DHCP client. One method that a DHCP client can utilize to communicate to the server that it is using a certain type of hardware or firmware is to set a value in its DHCP requests called the Vendor Class Identifier (VCI) (Option 60). This method allows a DHCP server to differentiate between the two kinds of client machines and process the requests from the two types of modems appropriately. Some types of set-top boxes also set the VCI (Option 60) to inform the DHCP server about the hardware type and functionality of the device. The value that this option is set to gives the DHCP server a hint about any required extra information that this client needs in a DHCP response.

dinsdag 5 augustus 2008

Network congestion avoidance

Network congestion avoidance is a process used in computer networks to avoid congestion.

The fundamental problem is that all network resources are limited, including router processing time and link throughput. Eg.:

todays (2006) Wireless LAN effective bandwidth throughput (15-100Mbit/s) is easily filled by a single personal computer.
Even on fast computer networks (e.g. 1 Gbit), the backbone can easily be congested by a few servers and client PCs.
Because P2P scales very well, file transmissions by P2P have no problem filling and will fill an uplink or some other network bottleneck, particularly when nearby peers are preferred over distant peers.
Denial of service attacks by botnets are capable of filling even the largest Internet backbone network links (40 Gbit/s as of 2007), generating large-scale network congestion
Implementations of connection-oriented protocols, such as the widely-used TCP protocol, generally watch for packet errors, losses, or delays (see Quality of Service) in order to adjust the transmit speed. There are many different network congestion avoidance processes, since there are a number of different trade-offs available. [1]

TCP/IP congestion avoidance
Main article: TCP congestion avoidance algorithm
The TCP congestion avoidance algorithm is the primary basis for congestion control in the Internet.

Problems occur when many concurrent TCP flows are experiencing port queue buffer tail-drops. Then TCP's automatic congestion avoidance is not enough. All flows that experience port queue buffer tail-drop will begin a TCP retrain at the same moment - this is called TCP global synchronization.

Purpose
"Recommendations on Queue Management and Congestion Avoidance in the Internet" (RFC 2309[7]) states that:

Fewer packets will be dropped with Active Queue Management (AQM).
The link utilization will increase because less TCP global synchronization will occur.
By keeping the average queue size small, queue management will reduce the delays and jitter seen by flows.
The connection bandwidth will be more equally shared among connection oriented flows, even without flow-based RED or WRED.

Random early detection
Main article: Random early detection
Main article: Weighted random early detection
One solution is to use random early detection (RED) on network equipments port queue buffer. [8] [9] On network equipment ports with more than one queue buffer, weighted random early detection (WRED) could be used if available.

RED indirectly signals to sender and receiver by deleting some packets, eg. when the average queue buffer lengths are more than eg. 50% (lower threshold) filled and deletes linearly more or (better according to paper) cubical more packets, [10] up to eg. 100% (higher threshold). The average queue buffer lengths are computed over 1 second at a time.

Flowbased-RED/WRED
Some network equipment are equipped with ports that can follow and measure each flow (flowbased-RED/WRED) and are hereby able to signal to a too big bandwidth flow according to some QoS policy. A policy could divide the bandwidth among all flows by some criteria.

IP ECN
Main article: Explicit Congestion Notification
Another approach is to use IP ECN[11]. ECN is only used when the two hosts signal that they want to use it. With this method, an ECN bit is used to signal that there is explicit congestion. This is better than the indirect packet delete congestion notification performed by the RED/WRED algorithms, but it requires explicit support by both hosts to be effective. [12] Some outdated or buggy network equipment drops packets with the ECN bit set, rather than ignoring the bit. More information on the status of ECN including the version required for Cisco IOS, by Sally Floyd[8], one of the authors of ECN.

When a router receives a packet marked as ECN capable and anticipates (using RED) congestion, it will set an ECN-flag notifying the sender of congestion. The sender then ought to decrease its transmission bandwidth; eg. by decreasing the tcp window size (sending rate) or by other means.

Cisco AQM: Dynamic buffer limiting (DBL)

Cisco has taken a step further in their Catalyst 4000 series with engine IV and V. Engine IV and V has the possibility to classify all flows in "aggressive" (bad) and "adaptive" (good). It ensures that no flows fill the port queues for a long time. DBL can utilize IP ECN instead of packet-delete-signalling. [13] [14]

TCP Window Shaping
Congestion avoidance can also efficiently be achieved by reducing the amount of traffic flowing into your network. When an application requests a large file, graphic or web page, it usually advertises a "window" of between 32K and 64K. This results in the server sending a full window of data (assuming the file is larger than the window). When you have many applications simultaneously requesting downloads, this data creates a congestion point at your upstream provider by flooding the queue much faster than it can be emptied. By using a device to reduce the window advertisement, the remote servers will send less data, thus reducing the congestion and allowing traffic to flow more freely. This technique can reduce congestion in a network by a factor of 40.

Additive increase/multiplicative decrease

The additive increase/multiplicative-decrease (AIMD) algorithm is a feedback control algorithm used in TCP Congestion Avoidance. Basically, AIMD represents a linear growth of the congestion window, combined to an exponential reduction when a congestion takes place.

The approach taken is to increase the transmission rate (window size), probing for usable bandwidth, until loss occurs. The policy of additive increase basically says to increase the congestion window by 1 MSS (Maximum segment size) every RTT until a loss is detected.

When loss is detected, the policy is changed to be one of multiplicative decrease which is to cut the congestion window in half after loss.

The result is a saw tooth behavior that represents the probe for bandwidth.

A loss event is generally described to be either a timeout or the event of receiving 3 duplicate ACKs. Also related to TCP congestion control is the slow start mechanism.

Other policies or algorithms for fairness in congestion control are AIAD, MIAD and MIMD.

Mathematical Formula

w ← w - aw when loss is detected

w ← w + b/w when an ACK arrives

Idea behind the formula

In a series of schemes, different proposals has been made in order to prevent congestion based on different numbers for a and b. For instance, considering the SCTP protocol, researchers suggested to make a = 0.125 while b = .01. Other times, researchers wants a and b to be functions of w which would result in creating a(w) and b(w).

Up until Vegas and FAST came about, the use of increasing the Round Trip Time (RTT) is used as a measure of congestion rather than looking at packet losses. This would define the congestion window size to be a function of the measured RTT. Typically, these modifications can only lead to improvements in special cases like networks with high bandwidth and low loss rates.

TCP Westwood

TCP Westwood (TCPW), is a sender-side-only modification to TCP NewReno that is intended to better handle large bandwidth-delay product paths (large pipes), with potential packet loss due to transmission or other errors (leaky pipes), and with dynamic load (dynamic pipes).

TCP Westwood relies on mining the ACK stream for information to help it better set the congestion control parameters: Slow Start Threshold (ssthresh), and Congestion Window (cwin). In TCP Westwood, an "Eligible Rate" is estimated and used by the sender to update ssthresh and cwin upon loss indication, or during its "Agile Probing" phase, a proposed modification to the well-known Slow Start phase. In addition, a scheme called Persistent Non Congestion Detection (PNCD) has been devised to detect persistent lack of congestion and induce an Agile Probing phase to expeditiously utilize large dynamic bandwidth.

The resultant performance gains in efficiency, without undue sacrifice of fairness, friendliness, and stability have been reported in numerous papers that can be found on this web site. Significant efficiency gains can be obtained for large leaky dynamic pipes, while maintaining fairness. Under a more appropriate criterion for friendliness, i.e. "opportunistic friendliness", TCP Westwood is shown to have good, and controllable, friendliness.

TCP Westwood plus is an evolution of TCP Westwood, in fact it was soon discovered that the Westwood bandwidth estimation algorithm did not work well in the presence of reverse traffic due to ack compression. The TCP Westwood+ version is implemented in the kernel of Linux.

TCP New Reno

TCP New Reno improves retransmission during the fast recovery phase of TCP Reno. During fast recovery, for every duplicate ACK that is returned to TCP New Reno, a new unsent packet from the end of the congestion window is sent, to keep the transmit window full. For every ACK that makes partial progress in the sequence space, the sender assumes that the ACK points to a new hole, and the next packet beyond the ACKed sequence number is sent.

Because the timeout timer is reset whenever there is progress in the transmit buffer, this allows New Reno to fill large holes, or multiple holes, in the sequence space - much like TCP SACK. Because New Reno can send new packets at the end of the congestion window during fast recovery, high throughput is maintained during the hole-filling process, even when there are multiple holes, of multiple packets each. When TCP enters fast recovery it records the highest outstanding unacknowledged packet sequence number. When this sequence number is acknowledged, TCP returns to the congestion avoidance state.

A problem occurs with New Reno when there are no packet losses but instead, packets are reordered by more than 3 packet sequence numbers. When this happens, New Reno mistakenly enters fast recovery, but when the reordered packet is delivered, ACK sequence-number progress occurs and from there until the end of fast recovery, every bit of sequence-number progress produces a duplicate and needless retransmission that is immediately ACKed.

New Reno performs as well as SACK at low packet error rates, and substantially outperforms Reno at high error rates.

TCP Tahoe and Reno

Two such variations are those offered by TCP Tahoe and Reno. TCP specifies a maximum segment size (MSS). The sender maintains a congestion window, limiting the total number of unacknowledged packets that may be in transit end-to-end.

To avoid congestion collapse, TCP makes a slow start when the connection is initialised and after a timeout. It starts with a window of 2 MSS. Although the initial rate is low, the rate of increase is very rapid: for every packet ACKed, the congestion window increases by 1 MSS so that for every round trip time (RTT), the congestion window has doubled. When the congestion window exceeds a threshold ssthresh the algorithm enters a new state, called congestion avoidance. In some implementations (e.g. Linux), the initial ssthresh is large, and so the first slow start usually ends after a loss. However, ssthresh is updated at the end of each slow start, and will often affect subsequent slow starts triggered by timeouts.

Congestion Avoidance. As long as non-duplicate ACKs are received, the congestion window is additively increased by one MSS every round trip time. When a packet is lost, duplicate ACKs will be received. The behaviour of Tahoe and Reno differ in how they detect and react to packet loss:

Tahoe: Loss is detected when a timeout expires before an ACK is received. Tahoe will then reduce congestion window to 1 MSS, and reset to slow-start state.
Reno: If three duplicate ACKs are received (i.e., three ACKs acknowledging the same packet, which are not piggybacked on data, and do not change the receiver's advertised window), Reno will halve the congestion window, perform a "fast retransmit", and enter a phase called Fast Recovery. If an ACK times out, slow start is used as it is with Tahoe.
Fast Recovery. (Reno Only) In this state, TCP retransmits the missing packet that was signaled by 3 duplicate ACKs, and waits for an acknowledgment of the entire transmit window before returning to congestion avoidance. If there is no acknowledgment, TCP Reno experiences a timeout and enters the slow-start state.

Both algorithms reduce congestion window to 1 MSS on a timeout event.

TCP Vegas

TCP Vegas is a TCP congestion control, or network congestion avoidance, algorithm that emphasizes packet delay, rather than packet loss, as a signal to help determine the rate at which to send packets. It was developed at the University of Arizona by Lawrence Brakmo and Larry L. Peterson.

TCP Vegas detects congestion at an incipient stage based on increasing Round Trip Time (RTT) values of the packets in the connection unlike other flavors like Reno, NewReno etc. which detect congestion only after it has actually happened via packet drops. The algorithm depends heavily on accurate calculation of the Base RTT value. If it is too small then throughput of the connection will be less than the bandwidth available while if the value is too large then it will overrun the connection. A lot of research is going on the fairness provided by the linear increase/decrease mechanism for congestion control in Vegas. One interesting caveat is when Vegas is inter-operated with other versions like Reno. In this case, performance of Vegas degrades because Vegas reduces its sending rate before Reno as it detects congestion early and hence gives greater bandwidth to co-existing TCP Reno flows.

TCP Vegas is one of several varieties of TCP congestion avoidance algorithm. It is one of a series of efforts at TCP tuning that adapt congestion control and system behaviors to new challenges faced by increases in available bandwidth in Internet components on networks like Internet2.

TCP Vegas has been implemented in the Linux kernel and possibly other operating systems also.

wiki2go