Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2005-07-12
2005-07-12
Baderman, Scott (Department: 2114)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S004110, C370S217000
Reexamination Certificate
active
06918063
ABSTRACT:
A method and system for promoting fault tolerance in a multi-node computing system that provides deadlock-free message routing in the presence of node and/or link faults using only two rounds and, thus, requiring only two virtual channels to ensure deadlock freedom. A lamb set of nodes for use in message routing is introduced, with each node in the lamb set being used only as points along message routes, and not for sending or receiving messages.
REFERENCES:
patent: 5371744 (1994-12-01), Campbell et al.
patent: 5435003 (1995-07-01), Chng et al.
patent: 5513313 (1996-04-01), Bruck et al.
patent: 5581689 (1996-12-01), Slominski et al.
patent: 5765015 (1998-06-01), Wilkinson et al.
patent: 5887127 (1999-03-01), Saito et al.
patent: 5963546 (1999-10-01), Shoji
patent: 6038688 (2000-03-01), Yoon
patent: 6104871 (2000-08-01), Badovinatz et al.
patent: 6130875 (2000-10-01), Doshi et al.
patent: 6202079 (2001-03-01), Banks
patent: 6230252 (2001-05-01), Passint et al.
patent: 6600719 (2003-07-01), Chaudhuri
patent: 6680915 (2004-01-01), Park et al.
patent: 6711407 (2004-03-01), Cornils
patent: 6760777 (2004-07-01), Agarwal et al.
patent: 6856627 (2005-02-01), Saleh et al.
patent: 6857026 (2005-02-01), Cain
patent: 6862263 (2005-03-01), Simmons
patent: 2002/0133620 (2002-09-01), Krause
PUBLICATION: “Message Routing in an Injured Hypercube”. Chen et al. Hypercube Concurrent Computers and Applications. Proceedings of the third conference on Hypercube concurrent computers and applications. vol. 7, pp. 312-317. Jan. 19-20, 1988.
PUBLICATION: “Origin-Based Fault-Tolerant Routing in the Mesh”. Libeskind-Hadas et al. High-Performance Computer Architecture, 1995. Proceedings, first IEEE Symposium. pp. 102-111. Jan. 22-25, 1995.
PUBLICATION: “Folded Pertersen Cube Networks: New Competitors for the Hypercubes”. Ohring et al. Parallel and Distributed Processing. Proceedings of the Fifth IEEE Symposium. pp. 582-589. Dec. 1-4, 1993.
PUBLICATION: “Computing in the RAIN: A Reliable Array of Independent Nodes”. Bohossian et al. Parallel and Distributed Systems, IEEE Transactions. vol. 12, pp. 99-114. Feb., 2001.
Ho Ching-Tien (Howard)
Stockmeyer Larry Joseph
Baderman Scott
Damiano Anne L.
International Business Machines - Corporation
Rogitz John L.
LandOfFree
System and method for fault tolerance in multi-node system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for fault tolerance in multi-node system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for fault tolerance in multi-node system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3425195