Web Server Redirection

Web Server Redirection generally means that the server receiving an HTTP request does not actually serve that request, but instead redirects the client to another server. Redirection is primarily needed forLoad Balancing. For popular websites, it is not possible for a single machine to serve all the requests. So several servers are in operation simultaneously, sharing the load. Redirection is also needed if the website has been shifted.

Ideally, redirection should be transparent to the end user. But there are both types of mechanism in use. e.g. There are websites like sourceforge.net which present the user with a list of mirror sites and user has to choose one of these mirrors. However, statistics show that most users use the first mirror in the list!

Redirection has two compoents

Where to redirect: There are many servers which can serve the request. This component involves deciding which server to use for a request.
How to redirect: Once a server is chosen, this involves how to do the redirection. Redirection can be done at various levels:
1. Application level, using HTTP redirect
2. IP level
3. MAC level
4. using DNS

Client side approaches

In this approach, it is the responsibility of client(typically a browser) to choose a server out of the available mirrors.

Earlier, netscape had several servers names www1, www2, www3, ... And it was the responsibility of Netscape browser to randomly choose one of these when user accessed www.netscape.com. This approach is not useful because it is not scalable.

Smart Client Approach: In this approach, the client downloads an applet from server. The applet decides which mirror to use for requests.

Both the above methods can be implemented on a proxy, in which case they will be effective for all hosts using that proxy.

Disadvantages of client side approach: It requires specific server side information like which server replicas(mirrors) are there, what parameter to use for decision and the value of that parameter for the replicas. Using a dynamic parameter like server load is not possible, because tracking it will have huge overhead.

DNS- based approaches

There is a single name used for the website. The client will ask DNS server for the IP address of the server. At that time, DNS server can choose an appropriate mirror and return its IP address. All mirrors have different IP addresses.

Disadvantages: The major problem with this approach is caching. The DNS responses are cached by intermediate DNS servers and by the client also. So for the time a response is cached, all the requests using that will go to the same mirror. As a result, that mirror may get overloaded. So this allows only a coarse grained redirection. The authority name server has only a limited control over caching. Setting a low TTL(time to live) in the DNS response does not help because an intermediate name server may refuse to accept response having TTL below a certain threshold.

Another obvious disadvantage is that it increases the load on DNS server.

There are two types of algorithms for deciding which mirror to choose: Constant-TTL and Adaptive-TTL

Constant-TTL DNS redirection

Round Robin: This is the simplest approach. The disadvantage is that it does not take into account the heterogenity of servers. The heterogenity may be because of load, network connectivity, or hardware capability. So all the mirrors should not be treated equally.
Server-State based algorithm: The DNS server collects state information (e.g. load) from all the mirrors and chooses a mirror on basis of this information. e.g. it may choose the least loaded mirror
Client-state based algorithm: In this, the mirror is chosen depending on the client. For this, knowledge of request-rate for a region of clients is required. Different regions can be redirected to different mirrors. If one mirror is not sufficient for a region of clients, a set of mirrors can be assigned for it and round-robin used to select a mirror for request from a client in that region. The association of client region to a mirror(or a set of mirrors) can be based on the distance between client network and the available mirrors.
A combination of client-state and server-state approaches can also be used.

Adaptive-TTL DNS redirection

In this approach, the TTL in the DNS response is set depending on state of the mirror and the information about the client. e.g. TTL can be set high for a currently lightly loaded high-end machine.

Dispatcher-based approaches

In this approach, there is a central server called dispatcher which receives all the requests and then redirects the requests to appropriate server replicas. The dispatcher has a single IP address and identifies individual servers through some other address. In this approach also, there can be redirection at different levels. The possibilities depend on whether all replicas are on a LAN or are spread on a WAN.

Packet single rewriting: Works when all replicas are on the same LAN. When the dispatcher receives a TCP connection establishment packet, it chooses one of the replicas. Replicas are identified by their IP addresses. The dispatcher changes the destination IP address in IP header to the IP address of the selected replica and sends the packet on its LAN. The selected replica gets the packet, processes the request and sends the reply with source IP address equal to the IP address of dispatcher. The dispatcher remembers the replica used for the connection and redirects all future packets of that connection to the same replica. This is a hard state because it is required for correctness.
Packet double rewriting: This is similar to packet single rewriting, except that the reply of selected replica has its own IP address as the source address. It is the job of dispatcher to change the source IP address and set it to its own IP address before sending the reply to the client.
Packet forwarding: In this approach, no change is made to the IP packet. This is good because otherwise the IP header checksum(in case of IPv4) and the TCP checksum have to be recomputed. So obviously, the replicas are identified by their MAC address. All the replicas and the dispatcher have the same IP address. The dispatcher chooses a replica for a connection and sends the incoming IP packets to that replica, addressing to the MAC address of that replica. This scheme also requires that all mirrors are on the same LAN.
HTTP redirection: In this case, the dispacther sends back an HTTP Redirect reply with the URL of the selected mirror. In this case, the mirrors need not be on the same LAN and can be addressed by their host name. The disadvantage of this approach is that for each redirection, there are two TCP connections, two HTTP requests and responses.

Server-based approaches

In this approach, one of the mirrors, on getting an HTTP request, may decide to redirect the client to another mirror. Again, the redirection may happen at different levels: IP level with packet rewriting or at HTTP level. But it is generally done at the HTTP level. This approach is used along with another approach like DNS-based redirection. DNS-based redirection provides coarse grained redirection and this approach provides the fine-grained redirection if required.

Comparison of approaches:

Source: V.Cardellini, M.Colajanni, and P.S.Yu, "Dynamic Load Balancing on Web-Server Systems"

The choice for a mirror can be based on various criterion. Two common ones are network bandwidth and server load. Network bandwith is not generally the bottleneck these days, so server load is a more important consideration.