Advanced Computer Networks
Lecture # 28
by
Prateek Gupta (Y0240)
We shall look at two distinct but related topics during
the course of this discussion. The outline of the lecture is as follows:
- Modeling wide area
traffic: Network traffic is often modeled as Poisson distribution
for analytic simplicity, but we shall see that the packet inter-arrival times
are not exponentially distributed. A detailed study of the issues in modeling
wide area traffic would be listed here.
- TCP Evaluation: Various methods exist
to evaluate the performance of the Internet’s Transmission Protocol (TCP)
but all of them have pitfalls that need to be understood prior to obtaining
results. Testing TCP is difficult for a variety of reasons which we shall
study during this lecture and look at approaches to model TCP performance.
Modeling Wide
Area Traffic
While modeling internet traffic, packet and connection arrival times it is
a common practice to assume a Poisson distribution, since the Poisson distribution
has several interesting theoretical properties. Let us first look at Poisson
distribution before venturing any further
P(x) = exp(-lambda) * power((lambda),x)/x!
(1)
where,
average=lambda and variance= lambda
x is discrete and lies in the open interval [0,infinity).
An important characteristic of the Poisson distribution is that the probability
of x taking a discrete value is independent upon the previous values i.e.
the probability is independent of the past. Poisson distribution is often
used to model arrival of packets during an interval. The packet arrival times
modeled by the Poisson distribution have an exponential distribution and
constitute an iid (independent identically distributed) process. However,
in practice it has been shown that the packet inter-arrival times do not
have an exponential distribution, hence the error introduced by modeling
them as Poisson distribution is significantly large. Studies have shown that
user-initiated TCP session arrivals, such as remote-login and file-transfer,
are well-modeled as Poisson processes with fixed hourly rates, but that other
connection arrivals deviate considerably from Poisson; that modeling TELNET
packet interarrivals as exponential grievously underestimates the burstiness
of TELNET traffic.
Modeling wide area traffic includes modeling a number of parameters such
as connection arrival, packet arrival, number of bytes transferred in the
connection, etc. The traffic has been modeled and traces collected for a
number of applications such as Telnet, FTP, SMTP, HTTP, SMTP, NNTP. However,
for the purpose of this lecture we will mainly focus our attention on the
TCP packet inter-arrival times and show that the packet inter-arrival time
cannot be modeled as a constant rate TCP connection for the entire day.
TCP Packet Interarrival Time
In this section we will look at the connection start times for several TCP
protocols. The pattern of connection arrivals is dominated by a 24-hour pattern,
as has been widely observed before. For TELNET connection arrivals and for
FTP session arrivals, within one-hour intervals the arrival process can be
well-modeled by a homogeneous Poisson process; each of these arrivals reflects
an individual user starting a new session. Over one hour intervals, no other
protocol’s connection arrivals are well-modeled by a Poisson process. Even
if we restrict ourselves to ten-minute intervals, only FTP session and TELNET
connection arrivals are statistically consistent with Poisson arrivals, though
the arrival of SMTP connections and of FTPDATA “bursts” (discussed later
in _ 6) during
ten-minute intervals are not terribly far from what a Poisson process would
generate. The arrivals of NNTP, FTPDATA, andWWW (World Wide Web) connections,
on the other hand, are decidedly not Poisson processes.
The following figure shows the mean hourly connection arrival rate for datasets
LBL-1 through LBL-4. For the different protocols, we plot for each hour the
fraction of an entire day’s connections of that protocol occurring during
that hour.
From the figure, it can shown that TELNET connection
arrivals and FTP session arrivals are very well modeled as Poisson, both
for 1-hour and 10-minute fixed rates. No other protocol’s arrivals are well
modeled as Poisson with fixed hourly rates. If we require fixed rates only
over 10-minute intervals, then SMTP and FTPDATA burst arrivals are not terribly
far from Poisson, though neither is statistically consistent with Poisson
arrivals, and consecutive SMTP interarrival times show consistent positive
correlation. NNTP, FTPDATA, and WWWarrivals, on the other hand, are clearly
not Poisson.
The NNTP and SMTP are not Poisson because of the flooding mechanism used
to propagate network news, NNTP connections can immediately spawn secondary
connections as new network news is received from one remote peer and in turn
offered to another. NNTP and SMTP connections are also often timer-driven
i.e. SMTP connections are affected by mailing list explosions because in
a mailing list one connection immediately follows another.
The inter arrival times in a telnet connection are consistent with the empirical
Tcplib distribution, unlike the exponential distribution. The distribution
of telnet inter-arrivals is “heavy tailed” i.e. arger values exist with small
but non-zero probability. Modeling TELNET packet arrivals by a Poisson process,
as is generally done, can result in simulations and analyses that significantly
underestimate performance measures such as average packet delay.
Figure 2: Empirical distribution
of packet interarrivals within Telnet connections
Evaluation of TCP
In the second part of the lecture we will look at some of the TCP evaluation
techniques. Understanding the performance of TCP is especially important
because it is the dominant protocol in today’s internet. Evaluating TCP is
difficult because of the range of environments, variables and evaluation
techniques. The evaluation techniques can broadly be divided into two classes,
implementation based and simulation based. The implementation based techniques
have the advantage of having real time traffic but they generally difficult
to model because of various reasons including the cost of the setup.
Let us have a look at some of the TCP features that would help us in evaluation
before we proceed to the evaluation process.
TCP Features
Basic Congestion Control.
TCP congestion control mechanisms basically include the following features:
slow start, congestion avoidance, fast retransmit and fast recovery congestion
control. Slow start and congestion avoidance are required by the IETF standards,
while fast retransmit and fast recovery are recommended, mainly as performance
enhancements.
Extensions for High Performance.
The standard TCP header limits the advertised window size to 64 KB, which
is not adequate in many situations. The following equation defines the minimum
window size
(W bytes) required for a TCP to fully utilize the given amount of available
bandwidth, B bytes/second, over a network with a round-trip time (RTT) of
R seconds
W = B. R (2)
Therefore, a network path that exhibits a long delay and/or a large bandwidth
may require a window size of more than 64 KB. Window scaling extensions to
TCP have been defined that allow the use a window size of more than 64 KB.
Window scaling can lead to more rapid use of the TCP sequence space. Therefore,
along with window scaling the Protect Against Wrapped Sequence Numbers (PAWS)
algorithm is required. In turn, the PAWS algorithm requires the timestamp
option . The timestamp option adds 12 bytes to each segment. These additional
header bytes are expected to be costly only to excessively low bandwidth
channels. The timestamp option also allows TCP to easily take multiple RTT
samples per round-trip time.
Selective Acknowledgement. TCP
uses a cumulative acknowledgment (ACK) that simply indicates the last in-order
segment that has arrived. When a segment arrives out-of-order a duplicate
ACK is transmitted. The selective acknowledgment (SACK) option allows the
TCP receiver to inform the TCP sender of which segments have arrived and
which segments have not. This allows the TCP sender to intelligently retransmit
only those segments that have been lost.
Delayed Acknowledgments. Delayed
acknowledgements allow TCP to refrain from sending an acknowledgment for
each incoming data segment, but rather transmit an ACK for every second full-sized
data segment received. If a second data segment is not received within
a given timeout (not to exceed 0.5 seconds) an ACK is transmitted.
Nagle Algorithm. The Nagle
algorithm is used to combine many small bits of data produced by applications
into larger TCP segments. The Nagle algorithm has been shown to reduce the
number of segments transmitted into the network, but also interferes with
the HTTP and NNTP protocols, as well as the delayed acknowledgment strategy,
thus reducing performance.
Larger Initial Windows. Large
window size allows TCP to start with an initial window size of 3-2 instead
of 1-2 packets to enable fast restart. However, the feature has not been
standardized and experiments should be conducted to show whether using a
large window size if fruitful or not.
Explicit Congestion Notification.
TCP interprets segment loss as indicating network congestion. However, Explicit
Congestion Notification(ECN) is a method in which a router can send a TCP
an explicit message stating that the network is becoming congested, rather
than dropping a segment.
Simulation based studies
Simulation based evaluation techniques have been widely employed for a large
number of TCP evaluations. A large variety of simulators for modeling internetworking
protocols exist, and are currently used by researchers. Some of these tools
are OpNet, x-sim, the Network Simulator, REAL. The following are advantages
of using simulator based techniques:
- Simulations do not generally employ high setup
costs and also do not need expensive machinery.
- Simulations provide a means of testing TCP performance
across rare situations which are not countered in day to day life.
- Complex topologies can be easily created via
simulation
- Simulators provide access to data about all the
traffic transmitted in the network
- Simulators give an easy way to test impact of
changes
- Simulators are not limited by the physical
limitations of the network
The disadvantages of using simulators are as follows:
- Simulators generally use an abstract TCP implementation,
rather than using implementations which are found in real scenarios (such
as real operating systems).
- Simulators generally do not model non network
events
- Simulators generally make some assumptions which
are not valid in the real world.
Implementation based Evaluation
While simulations can provide valuable insight into the performance of TCP,
they are often not as illuminating as tests conducted with real TCP implementations
over real networks. The implementation based evaluation techniques can be
divided into the following categories:
Dedicated Testbeds : In a
testbed, real TCP implementations are being tested over real networks. Testbeds
can incorporate hard to simulate network changes such as satellite link.
On the other hand, testbeds are generally limited in their capacity and speed
by the network at hand.
Emulation : An emulator models
a particular piece of the network path between two real hosts. Therefore,
emulation is a mix between simulation and using a testbed. Whereas, emulation
have several distinct advantages, they abstract some of the real behaviour
of the network modeled.
Live Internet Tests : Another
alternative is to run the tests directly over the internet rather than using
a dedicated testbed or a simulator. The disadvantages of conducting live
experiments over the Internet is the inability to assess the impact the sending
TCP has on the other network traffic sharing the network path. Whereas, with
simulators and testbeds it is fairly easy to monitor all traffic on the given
network, it is difficult to obtain the same kind of monitoring of all the
traffic competing with the TCP transfer a researcher generates when running
over the Internet. In addition, assessing the impact of a new algorithm,
or some other mechanism that is expected to be placed in the middle of the
network is difficult to accomplish in tests conducted over the Internet because
of the global nature of the internet .