Dispatcher-based approach

Next: Server-based approach Up: Mechanisms for request distribution Previous: DNS-based approach

Dispatcher-based approach

This approach gives full control over client requests to server side entity. In this approach, the DNS returns the address of a dispatcher that routes all the client request to other servers in the cluster. Thus it acts as a centralized scheduler at the server side that controls all the client request distribution. It presents single IP address to outside world, hence is much more transparent. These mechanisms can be categorized as follows:

Packet single-rewriting by the dispatcher : In this approach, all packets first reach dispatcher because IP address of dispatcher is provided by DNS. All the servers in cluster have different private addresses visible within the cluster. The dispatcher selects server in the cluster using simple algorithms like round robin etc. and changes the incoming packet's destination address with the private address of selected servers in the cluster. It also maintains a list of source IP addresses for active connections and sends the received packets from each TCP connection to the same server node. Further, nodes in the cluster need to replace source address in response packets with the IP address of dispatcher.
Although this solution maintains user transparency, it requires changes in the kernel of all the servers since packet rewriting occurs at TCP/IP level. This system combined with DNS-based solution for dispatcher, i.e primary DNS resolving host name to IP address of one of dispatcher for each cluster, can scale from LAN to WAN.
Packet double-rewriting by the dispatcher : This approach is similar to the above scheme, except that all address changes are done by the centralized dispatcher, not by nodes in cluster. The dispatcher first changes each incoming IP packet's destination address to that of selected server and sends it to the selected server node in the cluster. It also needs to modify the packets on the way back to the client, i.e., now in response IP packet, it replaces the source IP address of selected server with its address. The algorithm for server selection can be round robin, random, etc.
Cisco local director selects the server with least active connections. Magic router [4] uses a application level process that intercepts all packets between client and server and modifies address and checksum fields.
This approach has advantage that it does not require modification of all nodes in cluster.
Packet forwarding by the dispatcher : This approach is described in [18]. In this approach instead of IP packet rewriting dispatcher forwards packets to nodes in cluster using MAC address.
IBM Network Dispatcher's LAN solution assumes that server nodes are on the same LAN and share the same IP address but nodes have disabled ARP mechanism, so all packets reach to dispatcher. The dispatcher then forwards these packets to selected servers using their MAC addresses on the LAN without modifying its IP header. The scheduling policy can be based on server load and availability.
This mechanism is transparent to both client and server. No packet rewriting is required by dispatcher or servers as they share same IP address.
IBM Network Dispatcher's WAN solution is based on dispatcher at two levels. Centralized first level dispatcher uses single-rewriting mechanism to forward the packets to one of the second level dispatchers (on WAN) for each cluster, i.e. it replaces its IP address from packets to that of selected dispatcher(each cluster has its dispatcher). Second level dispatcher (at each cluster) changes its IP address in packet back to that of first level dispatcher and forwards it to selected server on LAN using MAC addresses. Selected node responds with IP address of primary dispatcher as in the previous approach.
ONE-IP address : This approach is described in [16], multiple machines in the web server system have same secondary IP address. This secondary IP address is then publicized by DNS. It is of two types:
1. routing-based dispatching : In this approach all packets with ONE-IP address are directed to IP address dispatcher by the subnetwork router. The dispatcher selects the server by applying hash function on the client IP address and then reroutes the packets to selected server using its primary IP address. Since hashing function is applied on client IP address, all packets from same client reach to same server.
2. broadcast-based dispatching : In this approach subnetwork router broadcasts the packets having destination ONE-IP address to all servers in web server cluster, the servers themselves compute hash function on client IP address to decide whether they are actual destination or not. It causes more server overhead.
Using simple hash function guarantees that same server will be selected for a given IP address but at the same time it is also the weakest factor in dynamic selection of server for load balancing. By changing hash function fault-tolerance can be achieved. Still hash function on client IP address is static assignment of server to each client.
HTTP redirection by Dispatcher
In this approach centralized dispatcher redirects the HTTP requests among the web server nodes by specifying appropriate status code in response and indicating the selected web server node address in its header. Dispatching can be based on load on servers or location.
This approach is transparent to user as most browsers support it, but user can perceive little bit more delay. No packet rewriting is required in this approach but state information of the server, i.e. load, number of connections etc. should be communicated to dispatcher in this case.
The Distributed Director [11] in second mode uses estimate of client server proximity and node availability to select the server and redirects the client to selected server. Its main disadvantage is duplication of TCP connections and hence increased delay in response.

Next: Server-based approach Up: Mechanisms for request distribution Previous: DNS-based approach

Puneet Agarwal 2001-05-12