Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Patent
1997-11-17
2000-07-18
Beausoliel, Jr., Robert W.
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
709220, G06F 1130
Patent
active
060922206
ABSTRACT:
Ordered machine-readable messages are reliably delivered among processing members in a multiprocessing computer system. The system includes multiple processing nodes, each having a unique source-ID and a membership view including one or more of the processing nodes with which it can nominally exchange messages. When a stimulus message is received by a first processing node, the node increments a coordinated local counter (CC). The node also sends a multicast message to all processing nodes in the first node's membership group. The multicast message includes the received stimulus message, the incremented CC value, and the first node's source-ID. The node further sets a timer, exclusively associated with the incremented CC value. When a multicast message is received at a processing node, the node performs a multicast input processing routine. The node sets its CC equal to the greater of its current value or the received multicast message's CC value. The node also sends an acknowledgement message to all processing nodes in its membership group. Also, in response to the multicast message, the node sets a timer, exclusively associated with the received multicast message's CC value. Whenever a node's timer associated with a CC value expires before messages with the same CC value have been received from each of the node's membership group, the node invokes a membership protocol requiring asymmetric safety.
REFERENCES:
patent: 4864559 (1989-09-01), Perlman
patent: 5079767 (1992-01-01), Perlman
patent: 5216675 (1993-06-01), Melliar-Smith et al.
patent: 5243596 (1993-09-01), Port et al.
patent: 5297143 (1994-03-01), Fridrich et al.
patent: 5317749 (1994-05-01), Dahlen
patent: 5339443 (1994-08-01), Lockwood
patent: 5355371 (1994-10-01), Auerbach et al.
patent: 5392433 (1995-02-01), Hammersley et al.
patent: 5414856 (1995-05-01), Yokota
patent: 5459725 (1995-10-01), Bodner et al.
patent: 5463733 (1995-10-01), Forman et al.
patent: 5467352 (1995-11-01), Cidon et al.
patent: 5502840 (1996-03-01), Barton
patent: 5513354 (1996-04-01), Dwork et al.
patent: 5519704 (1996-05-01), Farinacci et al.
patent: 5550973 (1996-08-01), Forman et al.
patent: 5612959 (1997-03-01), Takase et al.
patent: 5623670 (1997-04-01), Bohannon et al.
patent: 5634011 (1997-05-01), Auerbach et al.
patent: 5666486 (1997-09-01), Alfieri et al.
patent: 5682470 (1997-10-01), Dwork et al.
patent: 5856972 (1999-01-01), Riley et al.
patent: 5946316 (1999-08-01), Chen et al.
patent: 5996075 (1999-11-01), Matena
patent: 5999712 (1999-12-01), Moiin et al.
D. Malki et al., "Uniform Actions in Asynchronous Distributed Systems," Proceedings of the 13th Annual SCM Symposium on Prinicipals of Distributed Computing, 1994, pp. 274-283.
K. Berman et al., "Reliable Distributed Computing with the Isis Toolkit," IEEE Computer Society Press, Los Alamitos, CA 1994.
MPI: A Message-Passage Interface Standard, published by the Univ. of Tennesee, 1994.
D. Dolev et al., "On the Minimal Synchronism Needed for Distributed Consensus", Journal of the ACM 34(1), 1987, pp. 77-97.
D. Dolev et al., "A Framework for Partionable Membership Service", Technical Report TR 94-6, Department of Computer Science, Hebrew University.
F. Jahanian et al., "Processor Group Membership Protocols: Specification, Design and Implementation" in Proc. of 12th IEEE Symposium on Reliable Distributed Systems, pp. 2-11-1993.
R. van Renesse et al., "Horus: A Flexible Group Communication System", Comm. of the ACM, vol. 39, No. 4, pp. 76-83, 1996.
M. Rosu et al., "Early-Stopping Terminating Reliable Broadcast Protocol for General-Omission Failures", Proceedings of the 15th ACM Symposium of Principles of Distributed Computing, 1996, p. 209.
M. Aguilera et al. "Randomization and Failure Detection: A Hybrid Approach to Solve Consensus", Proceedings of 10th International Workshop on Distributed Algorithms, Italy 1996, pp. 29-39.
M. Herlihy et al., "Set Consensus Using Arbitrary Objects", 1994 ACM, pp. 324-333.
G. Bracha et al., "Asynchronous Consensus and Broadcast Protocols", Journal of the Association for Computing Machinery, vol. 32, No. 4, Oct. 1985, pp. 824-840.
T. Chandra et al., "The Weakest Failure Detector for Solving Consensus", Proc. 11th ACM Symposium on Principles of Distributed Computing, 1992, pp. 147-158.
D. Peleg, "Crumbling Walls: A Class of Practical and Efficient Quorum Systems", Proc. 14th ACM Symposium on Principles of Distributed Computing, 1995, pp. 120-128.
M. Fischer et al., "Impossibility of Distributed Consensus with One Faulty Process", Journal of the Association for Computing Machinery, vol. 32, No. 2, Apr. 1985, pp. 374-382.
C. Dwork et al., "Collective Consistency", Proceedings of 10th International Workshop on Distributed Algorithms, Italy 1996, pp. 234-250.
T. Chandra, "On the Impossibility of Group Membership", Proceedings of 15th Annual ACM Symposium on Principles of Distributed Computing, May 1996, pp. 322-340.
Palmer John Davis
Strong, Jr. Hovey Raymond
Upfal Eliezer
Baderman Scott T.
Beausoliel, Jr. Robert W.
International Business Machines - Corporation
LandOfFree
Method and apparatus for ordered reliable multicast with asymmet does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for ordered reliable multicast with asymmet, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for ordered reliable multicast with asymmet will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2049600