Supercomputing

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

07380175

ABSTRACT:
A method of operating a supercomputer having N computing elements each connected to a fast communications link is disclosed, the method comprising the steps of: operating the supercomputer to perform a computing operation; upon failure of a fast communications link transferring state from a computing element which, as a result of the fast communications link failure, is no longer able to communicate, to a spare computing element not previously engaged in the computing operation, and continuing the computing operation with the spare computing element, wherein the number of redundant elements M is chosen to satisfy the expressionin-line-formulae description="In-line Formulae" end="lead"?BM[N, (1−PT)]>Sin-line-formulae description="In-line Formulae" end="tail"?where S is a desired probability of successful completion of the computing operation within a time T and P is the probability of successful operation per unit time of a fast communications link.

REFERENCES:
patent: 5471623 (1995-11-01), Napolitano, Jr.
patent: 5592610 (1997-01-01), Chittor
patent: 2003/0046212 (2003-03-01), Hunter et al.
patent: 2006/0184939 (2006-08-01), Sahoo et al.
Manimaran et al., “A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis,” IEEE Transactions on Parallel and Distributed Systems, Nov. 1998, vol. 9, No. 11, pp. 1137-1152.
Agrawal, “Fault Tolerance in Multiprocessor Systems without Dedicated Redundancy,” IEEE Transactions on Computers, Mar. 1988, vol. 37, No. 3, pp. 358-362.
Raghavendra et al., “Reliability Analysis in Distributed Systems,” IEEE Transactions on Computers, Mar. 1988, vol. 37, No. 3, pp. 352-358.
Cherkassky et al., “Redundant Task-Allocation in Multicomputer Systems,” IEEE Transactions on Reliability, Sep. 1992, vol. 41, No. 3, pp. 336-342.
Irani et al., “A Methodology for the Design of Communication Networks and the Distribution of Data in Distributed Supercomputer Systems,” IEEE TRansactions on Computers, May 1982, vol. c-31, No. 5, pp. 419-434.
Varvarigou et al., “Module Replication for Fault-Tolerant Real-Time Distributed Systems,” IEEE Transactions on Reliability, vol. 47 , No. 1, Mar. 1998, pp. 8-18.
Ghosh et al., “Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 8, No. 3, Mar. 1997, pp. 272-284.
Koren et al., “On the Bandwith of a Multi-Stage Network in the Presence of Faulty Components,” 8th International Conference on Distributed Computing Systems, Jun. 13-17, 1988, pp. 26-32.
Wu et al., “Optimal Fault-Secure Scheduling,” The Computer Journal, vol. 41, No. 4, 1998, pp. 207-222.
Dutt et al., “Some Practical Issues in the Design of Fault-Tolerant Multiprocessors,” IEEE, 1991, pp. 292-299.
Peercy et al., “Software Schemes of Reconfiguration and Recovery in Distributed Memory Multicomputers Using the Actor Model,” IEEE, 1995, pp. 479-488.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Supercomputing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Supercomputing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Supercomputing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3982204

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.