Method and system for consistent cluster operational data in...

Electrical computers and digital processing systems: multicomput – Computer network managing – Network resource allocating

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S220000, C709S223000

Reexamination Certificate

active

06401120

ABSTRACT:

FIELD OF THE INVENTION
The invention relates generally to computer network servers, and more particularly to computer servers arranged in a server cluster.
BACKGROUND OF THE INVENTION
A server cluster ordinarily is a group of at least two independent servers connected by a network and utilized as a single system. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server.
Other benefits of clusters include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline for the duration of the maintenance activity. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications and the like.
When operating a server cluster, the cluster operational data (i.e., state) of any prior incarnation of a cluster needs to be known to the subsequent incarnation of a cluster, otherwise critical data may be lost. For example, if a bank's financial transaction data are recorded in one cluster, but a new cluster starts up without the previous cluster's operational data, the financial transactions may be lost. To avoid this, prior clustering technology required that each node (server) of a cluster possess its own replica of the cluster operational data on a private storage thereof, and that a majority of possible nodes (along with their private storage device) of a cluster be operational in order to start and maintain a cluster. This ensured that at least one node in any given set of nodes in a cluster was common to any previous cluster and thus the cluster had at least one copy of the correct cluster operational data. Further, the majority (quorum) requirement ensures that only one incarnation of the cluster exists at any point in time, e.g., two non-communicating subsets of the cluster membership cannot form two different instances of the cluster at the same time.
However, requiring a quorum of nodes has the drawback that a majority of the possible nodes of a cluster have to be operational in order to have a cluster. A recent improvement described in U.S. patent application Ser. No. 08/963,050, entitled “Method and System for Quorum Resource Arbitration in a Server Cluster,” assigned to the same assignee and hereby incorporated by reference herein in its entirety, provides the cluster operational data on a single quorum device, typically a storage device, for which cluster nodes arbitrate for exclusive ownership. Because the correct cluster operational data is on the quorum device, a cluster may be formed as long as a node of that cluster has ownership of the quorum device. Also, this ensures that only one unique incarnation of a cluster can exist at any given time, since only one node can exclusively own the quorum device. The single quorum device solution increases cluster availability, since at a minimum, only one node and the quorum device are needed to have an operational cluster. While this is a significant improvement over requiring a majority of nodes to have a cluster, a single quorum device is inherently not reliable, and thus to increase cluster availability, expensive hardware-based solutions are presently employed to provide highly-reliable single quorum device for storage of the operational data. The cost of the highly-reliable storage device is a major portion of the cluster expense.
SUMMARY OF THE INVENTION
Briefly, the present invention provides a method and system wherein at least three storage devices (replica members) are available to maintain the cluster operational data, and wherein the replica members are independent from any given node. A cluster may operate as long as one node possesses a quorum (e.g., a simple majority) of the replica members. This ensures that only one unique incarnation of a cluster can exist at any given time, since only one node may possess a quorum of members. The quorum requirement further ensures that a new or surviving cluster has at least one replica member that belonged to the immediately prior cluster and is thus correct with respect to the cluster operational data. Update sequence numbers and/or timestamps are used to determine the most up-to-date replica member from among those in the quorum. The method and system of the present invention require only a small number of relatively inexpensive components to form a cluster, thereby increasing availability relative to the quorum of nodes solution, while lowering cost relative to the single quorum device solution.
Other benefits and advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:


REFERENCES:
patent: 5280627 (1994-01-01), Flaherty et al.
patent: 5553239 (1996-09-01), Heath et al.
patent: 5659748 (1997-08-01), Kennedy
patent: 5673384 (1997-09-01), Hepner et al.
patent: 5727206 (1998-03-01), Fish et al.
patent: 5754821 (1998-05-01), Cripe et al.
patent: 5781910 (1998-07-01), Gostanian et al.
patent: 5828876 (1998-10-01), Fish et al.
patent: 5828889 (1998-10-01), Moiin et al.
patent: 5892913 (1999-04-01), Adiga et al.
patent: 5893086 (1999-04-01), Schmuck et al.
patent: 5909540 (1999-06-01), Carter et al.
patent: 5917998 (1999-06-01), Cabrera et al.
patent: 5918229 (1999-06-01), Davis et al.
patent: 5940838 (1999-08-01), Schmuck et al.
patent: 5946686 (1999-08-01), Schmuck et al.
patent: 5948109 (1999-09-01), Moiin et al.
patent: 5996075 (1999-11-01), Matena
patent: 5999712 (1999-12-01), Moiin et al.
patent: 6014669 (2000-01-01), Slaughter et al.
patent: 0 760 503 (1997-03-01), None
patent: 0 887 731 (1998-12-01), None
Bernstein et al., “Replicated Data”,Concurrency Control and Recovery in Database Systems, Chapter 8, Addison-Wesley Publishing Company, pp. 265-311 (1987).
Oki et al., “Viewstamped Replication: A New Primary Copy Method of Support Highly-Available Distributed Systems”,Proceedings of the 7thACM Symposium on Principles of Distributed Computing,pp. 8-17 (1988).
Carr, Richard, “The Tandom Global Update Protocol,”Tandem Systems Review, vol. 1, No. 2, pp. 74-85 (Jun. 1995).
Gifford, David K., “Weighted Voting for Replicated Data,” pp. 150-159 (1979).
Lamport, Leslie,A Fast Mutual Exclusion Algorithm, Digital Equipment Corporation (Nov. 14, 1985).
Lamport, Leslie,The Part-Time Parliament, Digital Equipment Corporation (Sep. 1, 1989).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for consistent cluster operational data in... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for consistent cluster operational data in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for consistent cluster operational data in... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2913411

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.