Directory-based failure recovery and load balancing system

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S002000, C714S047300

Reexamination Certificate

active

06298451

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention provides a distributed method for managing the assignment of tasks to servers in order to efficiently control server loads and provide for failure recovery in the case of server failure.
2. The Background Art
In electronic systems employing servers to perform mission critical tasks, it is often very important that those tasks are performed in the least amount of time. As the number of those tasks increases, it often becomes necessary to provide multiple servers to handle those tasks. In addition, as the number of requests for the various tasks increases, it often becomes necessary to provide additional servers to handle the higher volume of tasks. It is also necessary that the performance of critical tasks be completely assured, even in the case of server failure.
Since a given task may be required to be performed more often than one or more other tasks, the number of servers that are capable of performing that first task may be greater than the number of servers capable of performing those other tasks.
Each server is typically capable of handling many different types of tasks. In order to insure that the maximum efficiency of a server system is attained, a task manager or gateway is typically provided between a client computer and the one or more servers performing tasks for the client computer.
FIGS. 1A and 1B
show an example of operating a typical prior art server system.
Referring to
FIG. 1A
, prior art system
10
includes client computer
12
, gateway computer
14
, and servers
16
,
18
,
20
and
22
. In this prior art example, servers
16
,
18
,
20
, and
22
are each configured to perform tasks within overlapping groups. For example, server
16
may be capable of performing tasks A and B. Server
18
, in addition to being configured to perform tasks A and B, is configured to perform tasks C, D, and E. Server
20
is configured to perform tasks C, D, and E, in addition to tasks F, G, and H. Finally, server
22
is configured to perform tasks A, B, G, and H.
In this example, tasks A and B are able to be performed on three of the four servers, indicating that those tasks are either performed with high regularity, or are mission critical and thus must be able to be performed by many different servers in case one or more of those servers have a failure.
When a task is required to be performed in a prior art server system, a client computer such as client computer
12
issues a service request to a gateway such as gateway
14
. Gateway
14
then chooses a server from a list of available servers contained therein, and assigns the requested service to that server, such as server
20
as seen in
FIG. 1B
, and server
20
then performs the desired service. Once the service is performed, any data resulting from the performance of that service is passed back to the requesting client computer.
It is important that gateway
14
maintain an accurate list of active servers, so that a task isn't assigned to a failed server. In order for gateway
14
to have an accurate list at any given time of servers which are active and thus failure free, gateway
14
performs a verification through a simple communications means such as a ping. As those of ordinary skill in the art are readily aware, a ping is a simple data packet transmitted from a first network object to a second network object which stimulates a simple response from the second network object which tells the first network object that the second object is active. If gateway
14
continues to receive ping responses from any or all of servers
16
,
18
,
20
, and
22
, gateway
14
will keep each of those servers on its list of active servers. Failure to receive a predetermined number of consecutive pings from a given server, server
18
for example, will result in gateway
14
removing that server from the list of active servers. Future service requests, such as for the performance of task B, that would otherwise have been directed to the failed server
18
would then instead be directed to a back-up server such as server
20
.
In single processor environments, gateway
14
is configured to include information about actual physical server connections, a mechanism to survey the connections and status of the servers, and an assignment mechanism to assign service requests to particular server connections with a tightly coupled control.
In parallel computing environments, the active server list kept by gateway
14
is typically includes detailed information about the location of servers which can perform the various tasks. That detailed information often includes the processor number, slot number, machine number, physical port number, etc. The communications methods employed in these multiprocessor situations is often integrated into the operating system kernel in order to achieve maximum processing efficiency.
In addition to keeping a list of active servers, gateway
14
is also responsible for load balancing. Load balancing is used to spread out data traffic or a computing load across various capable machines. Thus, in the example above with respect to tasks A and B, servers
16
,
18
, and
20
are all capable of performing those tasks. If a request for task A arrives at gateway
14
from a client computer such as client computer
12
, gateway
14
may choose from among servers
16
,
18
, and
20
for the performance of task A. Thus, if server
16
is busy and perhaps has several other service requests pending which have not yet been performed, but server
18
is either idle or has fewer requests pending, gateway
14
would assign this task A to be performed by server
18
instead of by server
16
, in order to balance the computing load. Other lower-level load balancing techniques are known to those of ordinary skill in the art.
While the prior art systems are useful for their intended purposes, in order for those systems to work, gateway
14
must be tightly coupled to each server, and know the status of each server on a moment by moment basis. It would be beneficial to provide a system for performing task assignments, fail-over, and load balancing using a system which can be more loosely coupled but also operates very efficiently. The present invention provides such a system.
SUMMARY OF THE INVENTION
A method is described wherein tasks are managed in a hierarchical fashion using multiple directories servers and multiple resource management servers each of the directory servers and management servers having either distinct or overlapping responsibilities. The method includes determining that a client computer requires that a first task be performed by a server computer configured to handle that first task, causing the client computer to query the directory server to determine which servers within the plurality of servers is configured to handle the first task, causing the directory server to determine at least one server within the plurality of servers which is configured to handle the first task, and to transmit specific information about the at least one server to the client computer. The method proceeds with causing the client computer to transmit a task request to a preferred server chosen from the at least one servers.


REFERENCES:
patent: 5167035 (1992-11-01), Mann et al.
patent: 5553242 (1996-09-01), Russell et al.
patent: 5617570 (1997-04-01), Russell et al.
patent: 5966715 (1999-10-01), Sweeney et al.
patent: 5978577 (1999-11-01), Rierden et al.
patent: 6044465 (2000-03-01), Dutcher et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Directory-based failure recovery and load balancing system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Directory-based failure recovery and load balancing system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Directory-based failure recovery and load balancing system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2587995

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.