Electrical computers and digital processing systems: multicomput – Computer-to-computer session/connection establishing – Network resources access controlling
Reexamination Certificate
1997-11-17
2004-06-08
Etienne, Ario (Department: 2157)
Electrical computers and digital processing systems: multicomput
Computer-to-computer session/connection establishing
Network resources access controlling
C709S220000, C709S222000, C709S221000, C713S152000
Reexamination Certificate
active
06748438
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to multiprocessing systems. More particularly, the invention relates to the arbitration of access among multiple competing processing nodes to a shared resource by conducting a membership protocol among all nodes of the system including the shared resource, where the shared resource subsequently fences nodes outside its membership view.
2. Description of Related Art
Multiprocessing computing systems perform a single task using a plurality of processing “elements”, also called “nodes”, “participants”, or “members”. The processing elements may comprise multiple individual processors linked in a network, or a plurality of software processes or threads operating concurrently in a coordinated environment. In a network configuration, the processors communicate with each other through a network that supports a network protocol. This protocol may be implemented using a combination of hardware and software components. In a coordinated software environment, the software processes are logically connected together through some communication medium such as an Ethernet network. Whether implemented in hardware, software, or a combination of both, the individual elements of the network are referred to individually as members, and together as a group.
Frequently, the nodes of a multiprocessing system commonly access a “shared resource”. As an example, the common resource may comprise a storage device, such as a magnetic “hard” disk drive, tape drive or library, optical drive or library, etc. Resources may be shared for a number of different reasons, such as avoiding the expense of providing separate resources for each node, guaranteeing data consistency, etc.
FIG. 1A
shows a multiprocessing system
100
where multiple processing nodes
102
-
104
have common access to a shared resource
106
. The processing nodes
102
-
104
and shared resource are interconnected by communications paths
108
-
112
. A problem arises when communications between the nodes
102
-
104
is interrupted, for example, due to failure of the communications path
108
. This problem concerns the nodes' competing access to the resource
106
, possibly resulting in extremely inefficient operation of the system
100
.
In the absence of any scheme for arbitrating disputes between the incommunicant nodes
102
-
104
, the system
100
may experience “thrashing” back and forth between the nodes
102
-
104
, each node successively fencing the other node from resource access. This situation is undesirable, chiefly due to the inefficient time each node spends vying for access to the resource
106
rather than computing or actually accessing the resource
106
.
Another approach to address the failure of the communications path
108
is to designate one of the nodes
102
-
104
, in advance, to be master of the resource
106
in the event of a resource failure. This way, at least the active node will enjoy hassle-free access to the shared resource
106
. However, the second node is completely blocked from accessing the resource
106
. And, if the active node fails, then use of the resource
106
is absolutely frustrated.
Still another approach to failure of the communications path
108
is for the nodes
102
-
104
to communicate via the resource
106
. For some users, this approach may be too inefficient, because communications between the nodes
102
-
104
occupies communications bandwidth otherwise used to exchange data with the shared resource
106
. Furthermore, the nodes
102
-
104
are encumbered with additional overhead required for fault detection and resource control.
Consequently, due to certain unsolved problems, known communications recovery schemes are not completely adequate for some applications such as those with shared resources.
SUMMARY OF THE INVENTION
Broadly, the invention concerns a multiprocessing system that arbitrates access among multiple competing processing nodes to a shared resource by conducting a membership protocol among all nodes of the system including the shared resource, where the shared resource subsequently fences nodes outside its membership view. To determine the shared resource's membership view, active nodes repeatedly subscribe to the shared resource during prescribed membership intervals. From these subscriptions, an output membership view is generated for the shared resource. The membership protocol for the passive node ultimately ends when the membership view meets a termination condition guaranteeing asymmetric safety.
More specifically, in one embodiment a method is provided to determine access among multiple active nodes to a passive node in a multiprocessing system, with a communications network interconnecting the passive node and the active nodes. First, one of the nodes makes a membership protocol announcement. Responsive to the membership protocol announcement, a timer is started to expire after a fixed time. The time between starting and expiration of the timer defines a current membership interval.
Also responsive to the membership protocol announcement, each active node commences attempts at inter-nodal communications to identify all other nodes with which communication has not failed. All nodes so identified comprise a membership view. Further responsive to the membership protocol announcement, each active node commences an attempt to submit a subscription message to the passive node.
Subsequently, the timer expires, thereby closing the current membership interval. In response to the timer expiring, each active node establishes its membership view, made up of all other nodes identified during the current membership interval. Also established is the passive node's membership view, comprising all active nodes successfully submitting a subscription message during the current membership interval. The membership views of all nodes are integrated, using asymmetric safety, to establish an updated membership view of each node. Subsequent access to the passive node is then restricted according to the passive node's updated membership view.
The invention also includes another embodiment of coordinating access to shared resources in a multiprocessing system with multiple nodes subject to communications and node failures. The present invention prescribes that when communication or nodes failures are suspected, coordination problems be resolved by having each node, including nodes representing shared resources, participate in a membership protocol that provides asymmetric safety. For simplicity the present invention will be described in terms of methods that apply to a multiple node system containing one shared resource node. It will be obvious to one skilled in the art how to extend these methods to apply to multiple shared resource nodes.
One exemplary approach chooses a leader node among the nodes contending for the shared resource node. Depending on the access needs, the leader node may then have exclusive access to the shared resource node or the leader node may control the access of others, for example by maintaining a lock table for the shared resource node.
In one embodiment a method is provided to choose a new leader when it is suspected that the previous leader is no longer functioning properly or no longer able to access the shared resource node. Responsive to some indication that the previous leader may have failed (such as the timeout of a message requesting a response from the leader, or any such indication from any failure detection mechanism), a node may invoke a membership protocol that provides asymmetric safety. The participants in this membership protocol are all the nodes that can potentially access the shared resource and the shared resource node, itself. On completion of the membership protocol, if a regular (non shared resource) node finds that the shared resource node is not in its new membership view, the regular node attempts to rejoin the shared resource node; otherwise, after ascertaining that the shared resource node has completed the membership
Palmer John Davis
Strong, Jr. Hovey Raymond
Upfal Eliezer
Etienne Ario
International Business Machines - Corporation
McCabe, Esq. Mark C.
McGinn & Gibb PLLC
Salad Abdullahi E.
LandOfFree
Method and apparatus for accessing shared resources with... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for accessing shared resources with..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for accessing shared resources with... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3365811