Fault recovery on a massively parallel computer system to...

Electrical computers and digital processing systems: processing – Processing control

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

07631169

ABSTRACT:
A method and apparatus for fault recovery of on a parallel computer system from a soft failure without ending an executing job on a partition of nodes. In preferred embodiments a failed hardware recovery mechanism on a service node uses a heartbeat monitor to determine when a node failure occurs. Where possible, the failed node is reset and re-loaded with software without ending the software job being executed by the partition containing the failed node.

REFERENCES:
patent: 5513319 (1996-04-01), Finch et al.
patent: 6134655 (2000-10-01), Davis
patent: 6530047 (2003-03-01), Edwards et al.
patent: 6629257 (2003-09-01), Hartwell
patent: 6711700 (2004-03-01), Armstrong et al.
patent: 7191372 (2007-03-01), Jacobson et al.
patent: 7421478 (2008-09-01), Muchow
patent: 2002/0065646 (2002-05-01), Waldie et al.
patent: 2004/0034816 (2004-02-01), Richard
patent: 2004/0088523 (2004-05-01), Kessler et al.
patent: 2004/0103218 (2004-05-01), Blumrich et al.
patent: 2006/0143446 (2006-06-01), Frank et al.
patent: 2006/0221841 (2006-10-01), Lee et al.
patent: 2006/0236150 (2006-10-01), Lintz et al.
patent: 1391822 (2004-02-01), None
Archer et al, U.S. Appl. No. 11/539,248, filed Oct. 6, 2006, “Method and Apparatus for Routing Data in an Inter-Nodal Communications Lattice of a Massively Parallel Computer System by Dynamic Global Mapping of Contended Links”.
Archer et al, U.S. Appl. No. 11/539,270, filed Oct. 6, 2006, “Method and Apparatus for Routing Data in an Inter-Nodal Communications Lattice of a Massively Parallel Computer System by Semi-Randomly Varying Routing Policies for Different Packets”.
Archer et al, U.S. Appl. No. 11/539,300, filed Oct. 6, 2006, “Method and Apparatus for Routing Data in an Inter-Nodal Communications Lattice of a Massively Parallel Computer System by Routing Through Transporter Nodes”.
Archer et al, U.S. Appl. No. 11/539,329, filed Oct. 6, 2006, “Method and Apparatus for Routing Data in an Inter-Nodal Communications Lattice of a Massively Parallel Computer System by Dynamically Adjusting Local Routing Strategies”.
Gara, A. et al, “Overview of the Blue Gene/L System Architecture”, IBM Journal of Research and Development, International Business Machines Corporation, New York, NY, USA, vol. 49, No. 2-3, Mar. 1, 2005, pp. 195-212, X002469210.
Haring, R. A. et al, “Blue Gene/L Computer Chip: Control, test, and bring-up infrastructure”, IBM Journal of Research and Development, International Business Machines Corporation, New York, NY, USA, vol. 49, No. 2-3, May 1, 2005, pp. 289-301, XP002492140.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Fault recovery on a massively parallel computer system to... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Fault recovery on a massively parallel computer system to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fault recovery on a massively parallel computer system to... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4075181

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.