Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2005-10-28
2010-02-16
Beausoliel, Robert (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
Reexamination Certificate
active
07664992
ABSTRACT:
A method of operating a supercomputer having a plurality of computing elements each connected to a fast communications link is disclosed, the method comprising the steps of: scheduling specified elements to perform computing tasks in specified cycles of a computing operation; in the event of failure of a fast communications link in a given cycle, transferring state from a disabled element no longer able to communicate as a result of the failure to an idle element not scheduled to perform a task in the given cycle; operating the idle element to perform any uncompleted tasks scheduled for the disabled element remaining in the cycle.
REFERENCES:
patent: 4819232 (1989-04-01), Krings
patent: 5784616 (1998-07-01), Horvitz
patent: 6334196 (2001-12-01), Smorodinsky et al.
patent: 6374286 (2002-04-01), Gee et al.
patent: 6560717 (2003-05-01), Scott et al.
patent: 6598184 (2003-07-01), Merget et al.
patent: 6611729 (2003-08-01), Drum
patent: 6816813 (2004-11-01), Tan et al.
patent: 6934673 (2005-08-01), Alvarez et al.
patent: 7043728 (2006-05-01), Galpin
patent: 7058010 (2006-06-01), Chidambaran et al.
patent: 7100070 (2006-08-01), Iwamura et al.
patent: 7103809 (2006-09-01), Schlangen
patent: 7142505 (2006-11-01), Chaudhuri
patent: 7305675 (2007-12-01), Gulick
patent: 7437730 (2008-10-01), Goyal
patent: 2002/0078232 (2002-06-01), Simpson et al.
patent: 2003/0009603 (2003-01-01), Ruths et al.
patent: 2004/0123179 (2004-06-01), Dragomir-Daescu et al.
patent: 2004/0210898 (2004-10-01), Bergen et al.
patent: 2005/0268300 (2005-12-01), Lamb et al.
patent: 2006/0136772 (2006-06-01), Guimbellot et al.
patent: 0 987 630 (2000-03-01), None
patent: 2 392 520 (2004-03-01), None
patent: 07-282022 (1995-10-01), None
Peercy et al; “Software Schemes of Reconfigurations and Recovery in Distributed Memory Multicomputers Using the Action Model”; 25thInternational Symposium on Fault-Tolerant Computing, 1995; pp. 479-488.
Dutts et al; “Some Practical Issues in the Design of Fault-Tolerant Multiprocessors”; Proceedings of the International Symposium on Fault Tolerant Computing; Jun. 1991; pp. 292-299.
Alvarez, Guillermo et al., “Efficient Verification of Performability Guarantees”, Published in the Fifth International Workshop on Performability Modeling of Computer and Communication Systems (PMCCS-5), Sep. 15-16, 2001, Erlangen, Germany, pp. 1-7.
Feng et al., “The Bladed Beowulf: A Cost-Effective Alternative to Traditional Beowulfs”, Proceedings of the IEEE International Conference on Cluster Computing, 2002, pp. 1-10.
Oliner, A.J. et al., “Probabilistic QoS Guarantees for Supercomputing Systems”, Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN 05) 2005, IEEE, pp. 1-10.
Oliner, A.J. et al., “Fault-Aware Job Scheduling for BlueGene/L Systems” 2004, IEEE, pp. 1-10.
Raghavendra et al., “Reliability in Distributed Systems”, IEEE Transactions on Computers, vol. 37, No. 3, Mar. 1988, pp. 352-358.
Ryan et al., “The Blue Mountain Supercomputer Technical Background” Lambda, Jun. 1, 2001, pp. 4-24 and Annex A and B.
Saradhi et al., “Dynamic Establishment of Differentiated Survivable Lightpaths in WDM mesh Networks”; Computer Communications, Eslevier Science Publishers BV, Amsterdam, NL Feb. 15, 2004; ISSN 0140-3664, Computer Communications, vol. 27, pp. 273-294.
Lumley John William
Taylor Richard
Tofts Christopher
Beausoliel Robert
Hewlett--Packard Development Company, L.P.
Riad Amine
LandOfFree
Supercomputing does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Supercomputing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Supercomputing will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4234200