Electrical computers and digital processing systems: support – Digital data processing system initialization or configuration – Loading initialization program
Reexamination Certificate
1999-03-26
2003-07-29
Gaffin, Jeffrey (Department: 2182)
Electrical computers and digital processing systems: support
Digital data processing system initialization or configuration
Loading initialization program
C713S001000, C709S222000, C710S010000, C710S104000
Reexamination Certificate
active
06601165
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to an apparatus and method for fault resilient booting in a multi-processor computer system.
BACKGROUND OF THE INVENTION
Multi-processor computer systems may experience problems when booting if one or more of the processors fails during a reset. A processor fails by not successfully executing the reset instruction and may not respond to further instructions or may provide erroneous output. Booting involves starting the computer system, for example, by turning on the power to it. In response to the application of power, the processors in the system execute preliminary instructions at a pre-designated address in an attempt to initialize the processors and place them in an operational mode so that they may execute programs or applications. If any of these processors fails during the booting, the entire system may deadlock and be unable to operate. Booting may also involve a warm reset, which involves a software or hardware reset of a processor already running or to which power is already applied.
One of the processors in a multi-processor system is typically pre-designated as a boot strap processor. The boot strap processor functions to initialize the other processors during the booting process. If the boot strap processor fails during booting, the entire system may again deadlock and be unable to operate.
Accordingly, a need exists for an improved apparatus and method for fault resilient booting of a multi-processor system.
SUMMARY OF THE INVENTION
A first method consistent with the present invention may be used to boot a computer system having a plurality of processors. The method includes performing a cold reset of the processors and determining if any of the processors failed during the cold reset. The method also includes performing a warm reset of the processors and isolating any of the processors that failed in conjunction with performing the warm reset.
A first apparatus consistent with the present invention boots a computer system having a plurality of processors. The apparatus performs a cold reset of the processors and determines if any of the processors failed during the cold reset. The apparatus also performs a warm reset of the processors and isolates any of the processors that failed in conjunction with performing the warm reset.
A second method consistent with the present invention includes performing a cold reset of a plurality of processors within each of node of a multi-processor system. The cold reset involves attempting to identify one of the processors in each of the plurality of processors as a node-boot strap processor. The method further includes attempting to identify one of the node-boot strap processors as a system boot-strap processor and using the system-boot strap processor to perform a warm reset of the plurality of processors in each of the nodes. In conjunction with performing the warm reset, any of the processors that failed are isolated.
A second apparatus consistent with the present invention performs a cold reset of a plurality of processors within each of node of a multi-processor system. In conjunction with performing the cold reset, the apparatus attempts to identify one of the processors in each of the plurality processors as a node-boot strap processor. The apparatus also attempts to identify one of the node-boot strap processors as a system boot-strap processor and uses the system-boot strap processor to perform a warm reset of the plurality of processors in each of the nodes. In conjunction with performing the warm reset, the apparatus isolates any of the processors that failed.
REFERENCES:
patent: 4819232 (1989-04-01), Krings
patent: 5327548 (1994-07-01), Hardell, Jr. et al.
patent: 5450576 (1995-09-01), Kennedy
patent: 5491788 (1996-02-01), Cepulis et al.
patent: 5615330 (1997-03-01), Taylor
patent: 5694600 (1997-12-01), Khenson et al.
patent: 5715456 (1998-02-01), Bennett et al.
patent: 5724527 (1998-03-01), Karnik et al.
patent: 5724599 (1998-03-01), Balmer et al.
patent: 5754887 (1998-05-01), Damron et al.
patent: 5790850 (1998-08-01), Natu
patent: 5819087 (1998-10-01), Le et al.
patent: 5904733 (1999-05-01), Jayakumar
patent: 6073251 (2000-06-01), Jewett et al.
patent: 6108781 (2000-08-01), Jayakumar
patent: 6134071 (2000-10-01), Andoh et al.
patent: 6191499 (2001-02-01), Severson et al.
Allison Michael S.
Embry Leo J.
Feehrer John R.
Morrison John A.
Silva Stephen J.
Gaffin Jeffrey
Hewlett--Packard Company
Mai Rijue
LandOfFree
Apparatus and method for implementing fault resilient... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for implementing fault resilient..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for implementing fault resilient... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3034673