Web Server Redirection
Web Server Redirection generally means that the server receiving an HTTP request does not
actually serve that request, but instead redirects the client to another
server. Redirection is primarily needed forLoad Balancing. For popular
websites, it is not possible for a single machine to serve all the
requests. So several servers are in operation simultaneously, sharing
the load. Redirection is also needed if the website has been shifted.
Ideally, redirection should be transparent to the end user.
But there are both types of mechanism in use. e.g. There are websites
like sourceforge.net which present the user with a list of mirror sites
and user has to choose one of these mirrors. However, statistics show
that most users use the first mirror in the list!
Redirection has two compoents
- Where to redirect: There
are many
servers which can serve the request. This component involves deciding
which server to use for a request.
- How to redirect: Once
a server is chosen, this involves how
to do the redirection. Redirection can be done at various levels:
- Application level, using HTTP redirect
- IP level
- MAC level
- using DNS
Client side approaches
In this approach, it is the responsibility of client(typically a
browser) to choose a server out of the available mirrors.
Earlier, netscape had several servers names www1, www2, www3, ...
And it was the responsibility of Netscape browser to randomly choose
one of these when user accessed www.netscape.com. This approach is not
useful because it is not scalable.
Smart Client Approach: In
this approach, the client downloads an applet from server. The applet
decides which mirror to use
for requests.
Both the above methods can be implemented on a proxy, in which case
they will be effective for all hosts using that proxy.
Disadvantages of client side
approach: It requires specific server side information like which
server replicas(mirrors) are there, what parameter to use for decision
and the value of that parameter for the replicas. Using a dynamic
parameter like server load is not possible, because tracking it will
have huge overhead.
DNS- based approaches
There is a single name used for the website. The client will ask DNS
server for the IP address of the server. At that time, DNS server can
choose an appropriate mirror and return its IP address. All mirrors
have different IP addresses.
Disadvantages: The major
problem with this approach is caching. The DNS responses are cached by
intermediate DNS servers and by the client also. So for the time a
response is cached, all the requests using that will go to the same
mirror. As a result, that mirror may get overloaded. So this allows
only a coarse grained redirection. The authority name server has only a
limited control over caching. Setting a low TTL(time to live) in the
DNS response does not help because an intermediate name server may
refuse to accept response having TTL below a certain threshold.
Another obvious disadvantage is that it increases the load on DNS
server.
There are two types of algorithms for deciding which mirror to
choose: Constant-TTL and Adaptive-TTL
Constant-TTL DNS redirection
- Round Robin: This is the
simplest approach. The disadvantage is that it does not take into
account the heterogenity of servers. The heterogenity may be because of
load, network connectivity, or hardware capability. So all the mirrors
should not be treated equally.
- Server-State based algorithm: The
DNS server collects state information (e.g. load) from all the mirrors
and chooses a mirror on basis of this information. e.g. it may choose
the least loaded mirror
- Client-state based algorithm: In
this, the mirror is chosen depending on the client. For this, knowledge
of request-rate for a region of clients is required. Different regions
can be redirected to different mirrors. If one mirror is not sufficient
for a region of clients, a set of mirrors can be assigned for it and
round-robin used to select a mirror for request from a client in that
region. The association of client region to a mirror(or a set of
mirrors) can be based on the distance between client network and the
available mirrors.
- A combination of client-state and server-state approaches can
also be used.
Adaptive-TTL DNS redirection
In this approach, the TTL in the DNS response is set depending on state
of the mirror and the information about the client. e.g. TTL can be set
high for a currently lightly loaded high-end machine.
Dispatcher-based approaches
In this approach, there is a central server called dispatcher which
receives all the requests and then redirects the requests to
appropriate server replicas. The dispatcher has a single IP address and
identifies individual servers through some other address. In this
approach also, there can be redirection at different levels. The
possibilities depend on whether all replicas are on a LAN or are spread
on a WAN.
- Packet single rewriting:
Works when all replicas are on the same LAN. When the dispatcher
receives a TCP connection establishment packet, it chooses one of the
replicas. Replicas are identified by their IP addresses. The dispatcher
changes the destination IP address in IP header to the IP address of
the selected replica and sends the packet on its LAN. The selected
replica gets the packet, processes the request and sends the reply with
source IP address equal to the IP address of dispatcher. The dispatcher
remembers the replica used for the connection and redirects all future
packets of that connection to the same replica. This is a hard state
because it is required for correctness.
- Packet double rewriting: This
is similar to packet single rewriting, except that the reply of
selected replica has its own IP address as the source address. It is
the job of dispatcher to change the source IP address and set it to its own IP
address before sending the reply to the client.
- Packet forwarding: In
this approach, no change is made to the IP packet. This is good because
otherwise the IP header checksum(in case of IPv4) and the TCP checksum
have to be recomputed. So obviously, the replicas are identified by
their MAC address. All the
replicas and the dispatcher have the same IP address. The dispatcher
chooses a replica for a connection and sends the incoming IP packets to
that replica, addressing to the MAC address of that replica. This
scheme also requires that all mirrors are on the same LAN.
- HTTP redirection: In
this case, the dispacther sends back an HTTP Redirect reply with the
URL of the selected mirror. In this case, the mirrors need not be on
the same LAN and can be addressed by their host name. The disadvantage
of this approach is that for each redirection, there are two TCP
connections, two HTTP requests and responses.
Server-based approaches
In this approach, one of the mirrors, on getting an HTTP request, may
decide to redirect the client to another mirror. Again, the redirection
may happen at different levels: IP level with packet rewriting or at
HTTP level. But it is generally done at the HTTP level. This approach
is used along with another approach like DNS-based redirection.
DNS-based redirection provides coarse grained redirection and this
approach provides the fine-grained redirection if required.
Comparison
of approaches:
Source: V.Cardellini, M.Colajanni, and P.S.Yu, "Dynamic Load Balancing on Web-Server
Systems"
The choice for a mirror can be based on various criterion. Two common
ones are network bandwidth and server load. Network bandwith is not
generally the bottleneck these days, so server load is a more important consideration.