Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-05-06
2002-05-21
Baderman, Scott T. (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S013000, C709S239000, C707S793000
Reexamination Certificate
active
06393581
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to distributed or cluster computing systems and processes. More particularly, the present invention relates to fault tolerant, scaleable, cluster computing systems and processes operating within given time constraints.
BACKGROUND OF THE INVENTION
Cluster computing represents a compromise between the manageability, power, and ease of use of centralized uniprocessor systems and the reliability, fault-tolerance, and scalability of distributed systems. A cluster comprises a set of workstations or personal computers connected by a high-speed local area network. Thus, a cluster replaces a uniprocessor with a set of commodity computers meshed by a software backplane (that is, the logical software equivalent of a wired backplane wherein all the electronic printed circuit cards or modules are interconnected). A cluster is almost as easy to manage as a uniprocessor because all the members of the cluster are homogeneous and centrally administered. Moreover, it is easy to use because a cluster appears to users as a single powerful computer with a single file system and a single set of applications. Nevertheless, being a distributed system, a cluster offers the power, scalability, reliability, upgradability and fault-tolerance characteristic of such distributed systems. These characteristics give clusters a great advantage-over uniprocessor systems.
While cluster computing systems and methods are known, they generally lack the ability to meet time to response bounds.
Providing both fault tolerance and guaranteed completion time properties simultaneously is not trivial. The traditional approach to fault tolerance is to use a tightly coupled hardware based fault-tolerant computer systems, such as the ones manufactured by Stratus™ and Tandem™. The hardware approach suffers from at least three substantial problems. First, although this approach allows transparent masking of hardware faults, the system cannot tolerate software failures, which remain a source of downtime in many critical settings. Second, administrators cannot perform ‘hot’ (upgrading while the system is running) upgrades to software or hardware on such systems. Third, fault-tolerant hardware often lags the price/performance curve of commodity computers by several years.
The above tightly coupled hardware fault-tolerant computers have an advantage, however, of preserving the response characteristics of applications executed upon them. If an application is designed to respond to some class of requests within a time bound, for example 100 ms, a tightly coupled hardware based fault-tolerant platform will preserve that response time. In contrast, prior art distributed cluster fault-tolerance solutions are slow to respond-in general, and such systems are often slow to detect and react to failures, so that they rarely meet time bounds, especially in the presence of failures. For example, a classical solution would be to require applications to execute a Byzantine agreement protocol (highly complex and redundant) to mask software faults, imposing a significant computational and communication burden on the cluster.
Another limitation of prior art systems is their inability to scale to accommodate larger numbers of networked computers (QE's). These networked computers are used in conjuction with an External Adaptor (EA), or front end computer, which connects to an external communications network on one side and the networked computers in parallel (either by a bus or separate communication lines) on the other side. Scaling is important since the telecommunications industry hopes to build memory-mapped databases containing hundreds of millions of subscriber records. This scalability limitation is illustrated for example in cluster systems implementing a telephone switching (SS
7
) protocol where external requests are to be uniformly distributed among the QE's. The EA handles incoming requests that are batched (if large numbers of requests are received) for processing at the QE's. Thus, the workload on the EA rises in direct proportion to the number of QE's. When a protocol, like SS
7
, is implemented timing restrictions apply. In such an instance, these timing requirements act to limit the number of QE's that can be handled by an EA to 8 to 12 QE's, where handling fifty or more QE's may be required.
Herein, and in the art, several terms are used interchangeably. The “cluster” refers to the entire distributed computing system connected to a external communications network, e.g. the Internet, an Ethernet. Cluster will be the term of choice. The parts of the cluster that connect directly to the external communications network are called. the “front end”, or the external adaptors, or EA computers or EA's . Hereinafter, EA will be the term of choice. The part of the cluster that performs the computation and other tasks, is referred to as the “back end”, or the networked computers, or the query elements, or QE computers, or QE's. Hereinafter, QE will be the term of choice. Another term, “time-delay constrained,” is interchangeable with “time delay bound,” “time to respond”, and other combinations, but the meaning of the terms will be clear from the context to those skilled in the art.
Herein, alternate terms listed above may be used, or other terms, such as “group” may be used, wherein such use will be clear from the context.
It is therefore an object of the present invention to provide a cluster computing system and method that is fault tolerant.
It is another object of the present invention to retain the advantages of cluster computing, while gaining the fault-tolerance and timely responsiveness of a uniprocessor and/or hardware system solution.
It is yet another object of the present invention to provide a fault tolerant cluster computing system and method that completes a response or computation even if one or more of the components of a cluster fails.
It is still another object of the present invention to provide a fault tolerant system and method that is scaleable.
SUMMARY OF THE INVENTION
The present invention meets the foregoing objects in apparatuses and methods for designing practical, reliable, time delay-constrained (time bound) cluster computing systems, in which a limited number of EA's (typically two) interact with the external, outside world, directing clients requests to the QE's, and then relaying the replies back from the QE's to the clients.
Cluster computing is naturally suited to systems that perform large numbers of independent (or nearly independent) small computations. The cluster is partitioned into one or more EA's and multiple QE's. The EA isolate and hide the rest of the cluster from the external network and provide load balancing. Fault-tolerance requires replication of EA functionality.
In a preferred embodiment of the invention, a client computer contacts one of the EA's with a request via a communications network. The EA forwards the request to one of the QE's where a reply is generated for sending back to the client via the EA. The QE selected is determined by: the capabilities of computers that comprise the QE's, the current load distribution on the QE's, and the expected time bound for handling the request. With a reliable time delay-bound cluster, the reply is generated within a time delay bound despite failures in the computers comprising the EA's and the QE's.
In another preferred embodiment of the present invention, the EA's communicate with the outside world, with each other and with the QE's. Also means are provided for the QE's to communicate with each other. In this embodiment, the QE's are logically divided (that is the QE's are not physically separated) into at least two sets of lists—one set of lists for each EA. A list is a grouping of a number of QE's. Each of the lists within a set is non-overlapping within that set such that each QE appears only in one list within a set of lists. The sets of lists are arr
Birman Kenneth P.
Friedman Roy
Keshav Srinivasan
Vogels Werner
Baderman Scott T.
Cohen Jerry
Cornell Research Foundation Inc.
Erlich Jacob N.
Perkins Smith & Cohen LLP
LandOfFree
Reliable time delay-constrained cluster computing does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Reliable time delay-constrained cluster computing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reliable time delay-constrained cluster computing will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2829390