Electrical computers and digital processing systems: processing – Processing control
Reexamination Certificate
2007-02-02
2009-12-08
Coleman, Eric (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Reexamination Certificate
active
07631169
ABSTRACT:
A method and apparatus for fault recovery of on a parallel computer system from a soft failure without ending an executing job on a partition of nodes. In preferred embodiments a failed hardware recovery mechanism on a service node uses a heartbeat monitor to determine when a node failure occurs. Where possible, the failed node is reset and re-loaded with software without ending the software job being executed by the partition containing the failed node.
REFERENCES:
patent: 5513319 (1996-04-01), Finch et al.
patent: 6134655 (2000-10-01), Davis
patent: 6530047 (2003-03-01), Edwards et al.
patent: 6629257 (2003-09-01), Hartwell
patent: 6711700 (2004-03-01), Armstrong et al.
patent: 7191372 (2007-03-01), Jacobson et al.
patent: 7421478 (2008-09-01), Muchow
patent: 2002/0065646 (2002-05-01), Waldie et al.
patent: 2004/0034816 (2004-02-01), Richard
patent: 2004/0088523 (2004-05-01), Kessler et al.
patent: 2004/0103218 (2004-05-01), Blumrich et al.
patent: 2006/0143446 (2006-06-01), Frank et al.
patent: 2006/0221841 (2006-10-01), Lee et al.
patent: 2006/0236150 (2006-10-01), Lintz et al.
patent: 1391822 (2004-02-01), None
Archer et al, U.S. Appl. No. 11/539,248, filed Oct. 6, 2006, “Method and Apparatus for Routing Data in an Inter-Nodal Communications Lattice of a Massively Parallel Computer System by Dynamic Global Mapping of Contended Links”.
Archer et al, U.S. Appl. No. 11/539,270, filed Oct. 6, 2006, “Method and Apparatus for Routing Data in an Inter-Nodal Communications Lattice of a Massively Parallel Computer System by Semi-Randomly Varying Routing Policies for Different Packets”.
Archer et al, U.S. Appl. No. 11/539,300, filed Oct. 6, 2006, “Method and Apparatus for Routing Data in an Inter-Nodal Communications Lattice of a Massively Parallel Computer System by Routing Through Transporter Nodes”.
Archer et al, U.S. Appl. No. 11/539,329, filed Oct. 6, 2006, “Method and Apparatus for Routing Data in an Inter-Nodal Communications Lattice of a Massively Parallel Computer System by Dynamically Adjusting Local Routing Strategies”.
Gara, A. et al, “Overview of the Blue Gene/L System Architecture”, IBM Journal of Research and Development, International Business Machines Corporation, New York, NY, USA, vol. 49, No. 2-3, Mar. 1, 2005, pp. 195-212, X002469210.
Haring, R. A. et al, “Blue Gene/L Computer Chip: Control, test, and bring-up infrastructure”, IBM Journal of Research and Development, International Business Machines Corporation, New York, NY, USA, vol. 49, No. 2-3, May 1, 2005, pp. 289-301, XP002492140.
Darrington David L.
McCarthy Patrick Joseph
Peters Amanda
Sidelnik Albert
Coleman Eric
International Business Machines - Corporation
Martin Derek P.
Martin & Associates LLC
LandOfFree
Fault recovery on a massively parallel computer system to... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Fault recovery on a massively parallel computer system to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fault recovery on a massively parallel computer system to... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4075181