Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-11-04
2003-04-15
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C713S001000
Reexamination Certificate
active
06550019
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to an improved data processing system and, in particular, to a method and system for data processing system reliability, and more specifically, for location of faulty components.
2. Description of Related Art
As computers become more sophisticated, diagnostic and repair processes have become more complicated and require more time to complete. Diagnostic procedures generally specify several possible solutions to an error or problem in order to guide a service engineer to a determination and subsequent resolution of the problem. The service engineer may perform several corrective steps for each diagnostic procedure while attempting to resolve the problem. The service engineer may “chase” errors through lengthy diagnostic procedures in an attempt to locate one or more components that may be causing errors within the computer.
For example, a diagnostic procedure may indicate an installed component or field replaceable unit (FRU) that is a likely candidate for the error, and the installed FRU may be replaced with a new FRU. The reported problem may be considered resolved at that point. If, after further testing of the previously installed FRU, the FRU is later determined to be reliable, the original problem has not actually been resolved and may remain unresolved until the next error is reported.
Diagnosing errors during initial program load (IPL) is especially difficult because the operating system, which may contain sophisticated error logging functions, has not yet been loaded at that stage of system initialization, and the IPL code is purposefully devoid of most diagnostic functions in order to keep the IPL code efficient. If the system suffers from a freeze or hang condition in which the system simply stops responding during IPL, the only solution to diagnosing the error may be directing the service engineer to replace one FRU at a time and then rebooting the system to see if the system successfully completes the IPL.
The potential for misdiagnosis is compounded if the system has multiple, identical FRUs and the diagnostic procedure indicates that any one of the multiple FRUs could be a likely candidate for the error. For example, in a multiprocessor system, any one of the processor FRUs with associated IPL code may cause an error. In this situation, the service engineer may attempt, through trial and error, to resolve a problem by replacing each FRU in turn and then retesting the system. In the worst case, the time required for diagnosing the problem is multiplied by the number of identical FRUs. Isolating defective FRUs through trial and error is time consuming and costly. In addition to paying for unnecessary components, a business must also pay for the recurring labor costs of the service engineer and lost productivity of the user of the error-prone system.
Therefore, it would be advantageous to provide a method and apparatus for efficiently diagnosing problems during IPL within multiprocessor data processing systems.
SUMMARY OF THE INVENTION
A method and apparatus for detecting an error condition during initialization of a multiprocessor data processing system is provided. A master processor identification indicator is initialized to an initial value by a service processor in the data processing system. The master processor identification indicator may be a location in nonvolatile RAM to protect data integrity. One of the plurality of processors in the multiprocessor system is selected to be the master processor by being released by the service processor and winning the “race condition” to fetch the first instruction from memory for program execution. This processor then sets the master processor identification indicator to a unique processor identification value. The initial value may be a spoof number indicating whether the master processor has yet written its unique processor identification value. At some later point in time, the service processor detects a freeze or hang condition in the data processing system. The service processor reads the value of the master processor identification indicator and reports the value of the master processor identification indicator to indicate which processor among the plurality of processors in the data processing system was selected as the master processor prior to the detection of the hang condition.
REFERENCES:
patent: 5349664 (1994-09-01), Ikeda et al.
patent: 5418955 (1995-05-01), Ikeda et al.
patent: 5469575 (1995-11-01), Madduri
patent: 5815651 (1998-09-01), Litt
patent: 5867702 (1999-02-01), Lee
patent: 5892895 (1999-04-01), Basavaiah et al.
patent: 5919266 (1999-07-01), Sud et al.
patent: 6000040 (1999-12-01), Culley et al.
patent: 6178445 (2001-01-01), Dawkins et al.
patent: 6216226 (2001-04-01), Agha et al.
Ahrens George Henry
Dawkins George John
Lim Michael Youhour
Toohey Timothy Lee
McBurney Mark E.
Yee Duke W.
Yociss Lisa B.
LandOfFree
Method and apparatus for problem identification during... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for problem identification during..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for problem identification during... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3079415