Method and apparatus for partition resolution in clustered...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S037000

Reexamination Certificate

active

06363495

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates generally to a distributed data processing system and in particular to a method and apparatus for managing a server system within a distributed data processing system. Still more particularly, the present invention relates to a method and apparatus for handling network communication failures among servers within a distributed data processing system.
2. Description of Related Art
Multiple computers may be employed to increase performance of a computing site or to avoid problems associated with single computer failures. These computers are used to form a cluster, which is also referred to as a clustered computer system. An individual computer within a cluster is referred to as a cluster server, cluster member, or cluster node.
Generally, cluster nodes communicate with each other over a network. If a network communication failure occurs, the cluster may be partitioned into two or more parts. If cluster servers in a partition are unable to determine the status of cluster servers outside of the partition, continued application processing may result in a condition referred to as split-brain operation. To a subset A of cluster nodes, it is unclear whether the node(s) in some other subset B are actually operational or are simply unable to communicate with subset A. Such a situation is dangerous, as it can result in corruption of data maintained by the cluster or incorrect processing results.
For example, if a clustered computer system, containing two cluster nodes, is partitioned by severing links which are used for cluster communication between the nodes, each node will be unable to determine the state or status of the other. Further, any mutual exclusion mechanisms which depends on the severed link(s) will be inoperable or will yield incorrect results. This can result in both nodes deciding that it is proper to control a resource which is only safely controlled by one node at a time. Such a condition can result in corrupted data or incorrect processing results. A common example of such a resource is a file system residing on a disk connected to both nodes.
Corruption of a shared database is the most common manifestation of split-brain operation, though certainly any mutually-accessible resource may be affected. So more specifically, split-brain operation would be defined as a condition involving two or more computers in which mutually-accessible resources are not under the control of any mutual exclusion mechanism.
Clearly, to avoid a split-brain condition, mutual exclusion mechanisms must be preserved. Traditionally, high-availability systems have relied on various methods to minimize the probability of a split-brain condition. These include such things as redundant communication links and deadman timers. Each of these mechanisms has its strengths and weaknesses. Because of this, it is common for multiple links and methods to be used concurrently.
Redundant communication links are commonly used for split-brain prevention. These include such things as secondary network links, asynchronous (TTY) links, or device-bus links (of which target-mode SCSI is an example). A common use of a redundant link is to provide what is known as a heartbeat capability. Generally, a heartbeat operation is nothing more than an ongoing sequence of messages from one communication endpoint. (a sender) to one or more other endpoints (receivers) which indicate to the receiver(s) that the sender is operational. These messages are commonly referred to as “I'm alive” messages. A heartbeat exchange occurs when these communication endpoints pass heartbeat messages bi-directionally, indicating the “liveness” of all participating endpoints. In the event of a primary communication failure, this heartbeat mechanism over the redundant link(s) permits an endpoint to know that another endpoint remains active despite an inability to participate in normal cluster communication. Generally, this information is used as a fail-safe to ensure that resource control errors of the type described earlier do not occur.
If a redundant communication link is only used as a heartbeat mechanism, then it provides the cluster node with only enough information to determine that an unsafe condition may exist in which it would be potentially dangerous to take over certain resources. A heartbeat alone may not indicate the exact nature of the condition or reveal information sufficient to recover from it. However, it is sufficient to assure that a cluster node can recognize the existence of an unsafe condition with respect to resource control and take no action which might compromise resource integrity. This is the approach commonly taken. If an unsafe condition with respect to a cluster node is seen, do not attempt to take over any processing resources which may already be under control of that node. It is Better to do nothing than risk the consequences of a mistake.
For example, assume a two node system sharing a disk. The disk contains a database which may only be controlled by one node at a time. A mutual exclusion mechanism in the form of a lock manager operates over a primary network link to assure that only one node updates the database at a time. A heartbeat mechanism operates over a secondary network link. Should the primary link be disabled, negotiation for database access through the mutual exclusion mechanism will also be disabled. However, should the secondary link remain active and heartbeat communication continue to be received, a cluster node will at least be able to recognize the fact that the other cluster node remains active and it would be unsafe to acquire control of the database. This example should only be viewed as illustrative. The mechanisms described are also applicable to clusters of greater than two nodes.
It should be pointed out that while use of a redundant heartbeat link can allow a node to recognize the existence of an unsafe condition, it cannot guarantee recognition of a safe condition. Referring to the previous example, if both the primary and secondary links were to fail, a cluster node would not be able to determine the true nature of the failure. One possibility is that the communication links are intact but the other node has itself failed and is no longer sending messages. Another is that the links have both failed and the other node remains operational but unable to communicate that fact. This points out the essential problem in preventing split-brain operation. It is impossible to guarantee safety of operation against shared resources in the absence of a functioning mutual exclusion mechanism. The best one can do is minimize the probability of accessing such resources under unsafe conditions.
Because of this need to minimize the probability of interpreting an unsafe condition as safe, it is often important not only to utilize multiple links concurrently, but also for those links to be of different types. Further, for each type, the hardware, processing algorithm and operating system code path (communication stack) should be as different as possible. This reduces the possibility of encountering single points of failure within the hardware or operating system.
Generally, primary communication among cluster nodes occurs using higher performance network links, such as Ethernet, FDDI, or Token-Ring. Often, backup links utilizing one of these or a similar mechanism are used to provide cluster communication should the main link fail. Such backup links are helpful as secondary links for split-brain prevention; however, they may not be as reliable as other link types if they share code paths in common with the primary link(s). An example of this would be the TCP/IP communications stack in the operating system. Further, should a backup link take over primary communication, it is no longer useful as a secondary link.
One or more secondary links for split-brain prevention should be of a different type than the primary, both in hardware and operating system code path. For illustrative purposes, there are two commonly used secondary c

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for partition resolution in clustered... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for partition resolution in clustered..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for partition resolution in clustered... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2873407

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.