Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1997-03-31
2002-12-31
Wong, Peter (Department: 2181)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S036000, C714S038110, C714S039000
Reexamination Certificate
active
06502208
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to error handling in computer systems, and more particularly to check stop error handling in such systems.
BACKGROUND OF THE INVENTION
When a hardware fault is detected in a digital computer system, sometimes the fault is so severe or the risk of data corruption so great that detection of the error is designed to cause an immediate halt of further operations. Except for performing a complete system reset, there is no means of recovering from this state, which is typically called a Check Stop state. Because of the severity of the error, it is important to be able to determine the source of the error so that the failing component can be replaced quickly and the system restored to normal operation.
However, since the main processor is stopped in this condition, a separate processing mechanism is needed to capture failure information. The mechanism is usually referred to as a Service Processor, which provides embedded controller operations that remain even when check stop failures occur. Unfortunately, sophisticated processing mechanisms are needed to extract failure information from the failing components when all the normal functional paths are frozen and perform analysis on the information. Including such sophisticated processing mechanisms, however, increase the system's costs.
Further, typical systems contain very large amounts of error data in the form of latch bits. An engineering change to add even a single new latch bit changes the layout of an entire scan string of data and increases the amount of data needing to be extracted. Providing sufficient storage space to hold the increased data further adds to overall system costs.
Accordingly, what is needed is a capable system for check stop error analysis and handling that functions on low-end computer systems, utilizes a basic, low-cost service processor, and requires relatively small storage space.
SUMMARY OF THE INVENTION
These needs are met through the present invention which provides method and system aspects for check stop error handling. A method aspect for check stop error handling in a computer system, the computer system comprising a plurality of components including a processor that supports an operating system and firmware, includes utilizing a service processor following a check stop error for error data retrieval and attempting a reboot of the computer system. The method further includes initiating firmware for failure reporting based on the error data retrieval when the reboot is successful. In another method aspect, the method includes performing error data retrieval from fault isolation registers of the plurality of components using a service processor following a check stop error, and transforming the error data into an abstracted error log via the firmware after a successful reboot.
In a system aspect, a computer system with check stop error handling includes a processing mechanism, the processing mechanism supporting an operating system, and a service processor coupled to the processing mechanism, the service processor performing error data retrieval following a check stop error. The system further includes a firmware mechanism supported by the processing mechanism, the firmware mechanism performing failure reporting based on the error data retrieval.
REFERENCES:
patent: 4044337 (1977-08-01), Hicks et al.
patent: 4095268 (1978-06-01), Kobayashi et al.
patent: 4215397 (1980-07-01), Hom
patent: 4360917 (1982-11-01), Sindelar et al.
patent: 4511982 (1985-04-01), Kurakake
patent: 4604746 (1986-08-01), Blum
patent: 4965717 (1990-10-01), Cutts, Jr. et al.
patent: 5142165 (1992-08-01), Allard et al.
patent: 5193181 (1993-03-01), Barlow et al.
patent: 5245615 (1993-09-01), Treu
patent: 5249187 (1993-09-01), Bruckert et al.
patent: 5251227 (1993-10-01), Bruckert et al.
patent: 5267246 (1993-11-01), Huang et al.
patent: 5291600 (1994-03-01), Lutz et al.
patent: 5307482 (1994-04-01), Bealkowski et al.
patent: 5313625 (1994-05-01), Hess et al.
patent: 5313628 (1994-05-01), Mendlesohn et al.
patent: 5317752 (1994-05-01), Jewett et al.
patent: 5375219 (1994-12-01), Okabe
patent: 5390324 (1995-02-01), Burckhartt et al.
patent: 5410706 (1995-04-01), Farrand et al.
patent: 5421006 (1995-05-01), Jablon et al.
patent: 5437047 (1995-07-01), Nakamura
patent: 5442777 (1995-08-01), Nakajima et al.
patent: 5444859 (1995-08-01), Baker et al.
patent: 5450579 (1995-09-01), Johnson
patent: 5455933 (1995-10-01), Schieve et al.
patent: 5467449 (1995-11-01), Gauronski et al.
patent: 5471674 (1995-11-01), Stewart et al.
patent: 5475839 (1995-12-01), Watson et al.
patent: 5487148 (1996-01-01), Komori et al.
patent: 5488688 (1996-01-01), Gonzales et al.
patent: 5499346 (1996-03-01), Amini et al.
patent: 5530847 (1996-06-01), Schieve et al.
patent: 5530946 (1996-06-01), Bouvier et al.
patent: 5557547 (1996-09-01), Phaal
patent: 5560018 (1996-09-01), Macon, Jr. et al.
patent: 5560033 (1996-09-01), Doherty et al.
patent: 5564054 (1996-10-01), Bramnick et al.
patent: 5619644 (1997-04-01), Crockett et al.
patent: 5680537 (1997-10-01), Byers et al.
patent: 5712967 (1998-01-01), Grossman et al.
patent: 5742851 (1998-04-01), Sekine
patent: 5768496 (1998-06-01), Lidgett et al.
patent: 5768612 (1998-06-01), Nelson
patent: 5777549 (1998-07-01), Arrowsmith et al.
patent: 5784617 (1998-07-01), Greenstein et al.
patent: 5790870 (1998-08-01), Hausauer et al.
patent: 5805785 (1998-09-01), Dias et al.
patent: WO9700480 (1997-01-01), None
patent: 811929 (1997-12-01), None
patent: 556672 (1980-01-01), None
patent: 6334492 (1988-07-01), None
patent: 3179538 (1991-08-01), None
patent: 63255742 (1998-10-01), None
IBM Technical Disclosure Bulletin, vol. 36, No. 8, Aug. 1993, pp. 607-612.
“Chapter 10: Error Detection and Handling”, PCI System Architecture, 189-207.
IBM Technical Disclosure Bulletin, “Programmed Clock Synchronization In A Skewed Clock Enviroment”, vol. 26, No. 8, Jan. 1984.
IBM Technical Disclosure Bulletin, “Method to Initialize the Error Handling Logic of a Peripheral Component Interconnect System”, vol. 37, No. 08, Aug. 1994.
IBM Technical Disclosure Bulletin, “BUS Fault Identification Algorithm”, vol. 32, No. 6A, Nov. 1989.
Kitamorn Alongkorn
McLaughlin Charles Andrew
International Business Machines - Corporation
Sawyer Law Group LLP
Vo Tim
Wong Peter
LandOfFree
Method and system for check stop error handling does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for check stop error handling, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for check stop error handling will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2949631