Emulation of persistent group reservations

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S230000

Reexamination Certificate

active

06658587

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to distributed computer systems, and more particularly to a system and method that enables the emulation of Persistent Group Reservations, or PGRs, on non-PGR compliant shared disks to enable the disk's utilization in a system which implements a PGR-reliant algorithm. One such algorithm enables a non-PGR compliant shared disk to be used as a quorum disk supporting highly available clustering software.
2. Related Art
As computer networks are increasingly used to link computer systems together, distributed operating systems have been developed to control interactions between computer systems across a computer network. Some distributed operating systems allow client computer systems to access resources on server computer systems. For example, a client computer system may be able to access information contained in a database on a server computer system. When the server fails, it is desirable for the distributed operating system to automatically recover from this failure. Distributed computer systems with distributed operating systems possessing an ability to recover from such server failures are referred to as “highly available” systems. High availability is provided by a number of commercially available products including Sun™ Cluster from Sun™ Microsystems, Palo Alto, Calif.
Distributed computing systems, such as clusters, may include two or more nodes, which may be employed to perform a computing task. Generally speaking, a node is a group of circuitry designed to perform one or more computing tasks. A node may include one or more processors, a memory and interface circuitry. Generally speaking, a cluster is a group of two or more nodes that have the capability of exchanging data between nodes. A particular computing task may be performed upon one node, while other nodes perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among the nodes to decrease the time required to perform the computing task as a whole. Generally speaking, a processor is a device configured to perform an operation upon one or more operands to produce a result. The operations may be performed in response to instructions executed by the processor.
Nodes within a cluster may have one or more storage devices coupled to the nodes. Generally speaking, a storage device is a persistent device capable of storing large amounts of data. For example, a storage device may be a magnetic storage device such as a disk device, or optical storage device such as a compact disc device. Although a disk device is only one example of a storage device, the term “disk” may be used interchangeably with “storage device” throughout this specification. Nodes physically connected to a storage device may access the storage device directly. A storage device may be physically connected to one or more nodes of a cluster, but the storage device need not necessarily be physically connected to all the nodes of a cluster. The nodes that are not physically connected to a storage device may not access that storage device directly. In some clusters, a node not physically connected to a storage device may indirectly access the storage device via a data communication link connecting the nodes.
One of the aims of a highly available (HA) system is to minimize the impact of individual components' failures to system availability. An example of such a failure is a communications loss between some of the nodes of a distributed system. Referring down to
FIG. 1
, an exemplar cluster is illustrated. In this example, the cluster,
1
, comprises four nodes,
102
,
104
,
106
and
108
. The four nodes of the system share a disk,
110
. In the exemplar herein presented, nodes
102
through
104
have access to disk
110
by means of paths
120
through
126
, respectively. Accordingly, this exemplar disk can be said to be “4-ported”. As previously discussed, access to disk
110
may be by means of physical connection, data communication link or other disk access methodologies well-known to those having ordinary skill in the art.
The nodes in the exemplar system are connected by means of data communication links
112
,
114
,
116
and
118
. In the event that data communications links
112
and
114
fail, node
106
will no longer be capable of communication with the remaining nodes in the system. It will be appreciated from study of the figure however that node
106
retains its communications with shared disk
110
by means of path
124
. This gives rise to a condition known as “split brain”.
Split brain refers to a cluster breaking up into multiple sub-clusters, or to the formation of multiple sub-clusters without knowledge of one another. This problem occurs due to communication failures between the nodes in the cluster, and often results in data corruption. One methodology to ensure that a distributed system continues to operate with the greatest number of available resources, while excluding the potential for data corruption occasioned by split brain, is through the use of a quorum algorithm with a majority vote count. Majority vote count is achieved when a quorum algorithm detects a vote count greater than half the total number of votes. In a system with n nodes attached to the quorum device, each node is assigned one vote, and the system's quorum device is assigned n−1 votes, as will be later explained.
To explain how a majority vote count quorum algorithm operates, consider the four-node cluster illustrated in
FIG. 1
, and assume no votes are assigned to a quorum device. Assume a communications failure occurs between node
106
and the other nodes in the cluster. Since each node has one vote, and nodes
102
,
104
and
108
are operating properly and are in communication with one another, a simple quorum algorithm would count one vote for each of these devices, against one vote for node
106
. Since 3>1, the subcluster comprising nodes
102
,
104
and
108
attains majority vote count and this simplified quorum algorithm excludes node
106
from accessing shared disk
110
.
The simplified example previously discussed becomes somewhat more complicated when equal numbers of nodes are separated from one another. Again considering the example shown in
FIG. 1
, consider the loss of communications links
114
and
118
. In this case, nodes
102
and
108
are in communication with one another, as are nodes
104
and
106
, but no communications exist between these pairs. In this example, communications are still intact between each of the nodes and shared disk
110
. It will be appreciated however, that 2 is not greater than 2, and therefore neither subcluster attains majority vote count and this relatively simple quorum algorithm fails.
A quorum device, or QD, is a hardware device shared by two or more nodes within the cluster that contributes votes used to establish a quorum for the cluster to run. The cluster can operate only when a quorum of votes, i.e. a majority of votes as previously explained, is available. Quorum devices are commonly, but not necessarily, shared disks. Most majority vote count quorum algorithms assign the quorum device a number of votes which is one less than the number of connected quorum device ports. In the previously discussed example having a 4-node cluster having n=4, where each node is ported to the quorum device, that quorum device would be given n−1 or 3 votes, although other methods of assigning a number of votes to the quorum device may be used.
The pair of nodes within the cluster that, through the quorum algorithm, first take ownership of the disk cause the algorithm to exclude the other pair. In this example, the two nodes which first take ownership of disk
110
following the fractioning of the cluster, for instance a subcluster comprising nodes
102
and
108
, cause the algorithm to exclude the other subcluster comprising nodes
104
and
106
from accessing the shared disk until the system can be restored. This is true

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Emulation of persistent group reservations does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Emulation of persistent group reservations, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Emulation of persistent group reservations will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3171313

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.