Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-07-13
2003-12-02
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S006130, C714S057000, C712S016000
Reexamination Certificate
active
06658594
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates generally to the field of computer architecture and, more specifically, to methods and systems for displaying and logging system checkpoints.
2. Description of Related Art
A logical partitioning option (LPAR) within a data processing system (platform) allows multiple copies of a single operating system (OS) or multiple heterogeneous operating systems to be simultaneously run on a single data processing system platform. A partition, within which an operating system image runs, is assigned a non-overlapping sub-set of the platform's resources. These platform allocable resources include one or more architecturally distinct processors with their interrupt management area, regions of system memory, and input/output (I/O) adapter bus slots. The partition's resources are represented by its own open firmware device tree to the OS image.
Each distinct OS or image of an OS running within the platform is protected from each other such that software errors on one logical partition can not affect the correct operation of any of the other partitions. This is provided by allocating a disjoint set of platform resources to be directly managed by each OS image and by providing mechanisms for ensuring that the various images can not control any resources that have not been allocated to it. Furthermore, software errors in the control of an OS's allocated resources are prevented from affecting the resources of any other image. Thus, each image of the OS (or each different OS) directly controls a distinct set of allocable resources within the platform.
Many data processing systems utilize a method of recovering from a system failure referred to as a checkpoint/restart. A checkpoint is a copy of the computer's memory that is periodically saved on disk along with the current register settings (last instruction executed, etc.). In the event of any failure, the last checkpoint serves as a recovery point. When the problem has been fixed, the restart program copies the last checkpoint into memory, resets all the hardware registers and starts the computer from that point. Any transactions in memory after the last checkpoint was taken until the failure occurred will be lost. Typically, the checkpoint information is logged to a non-volatile random access memory (NV-RAM) as well as displayed to a user on an operator panel.
The set of codes to perform the initialization of the I/O path is fairly complex and error-prone in early development which are not fully debugged. Therefore, it is desirable to have the visual checkpoint mechanism for an progress indicator to help debugging any software errors when a system crash occurs during the I/O subsystem initialization. By the same token, in cases of possible errors of hardware components while accessing and setting up their registers, the checkpoint/progress code helps to pin-point which register of what hardware chip was being accessed right before the crash, thus facilitating the setup of equipment to capture the failure for analysis. Although, there is no requirement that the system be completely booted in order to display/log checkpoints, the current checkpoint mechanism requires its I/O path be fully configured. Thus no progress indicator is available during the execution of the complex I/O initialization code. Therefore, it would be desirable to have a method of displaying and logging system checkpoints to the operator panel and NV-RAM prior to the data processing system completing the booting process.
SUMMARY OF THE INVENTION
The present invention provides a method, system, and apparatus of recording information generated by a data processing system prior to completion enablement of programmed input/output services for the data processing system. In one embodiment, a service processor receives an attention interrupt from a host processor. The service processor then stops the operation of all host processors in the data processing system. The service processor then reads the information, such as a system checkpoint, from a buffer within the host processor's system memory and writes the information into a non-volatile random access memory as well as displays the information to a user via a video display. The service processor then restarts the host processors.
REFERENCES:
patent: 4455601 (1984-06-01), Griscom et al.
patent: 5560019 (1996-09-01), Narad
patent: 5875343 (1999-02-01), Binford et al.
patent: 5884021 (1999-03-01), Hirayama et al.
patent: 6189117 (2001-02-01), Batchelor et al.
patent: 6574748 (2003-06-01), Andress et al.
Bui Tam D.
Lee Van Hoa
Tran Kiet Anh
Beausoliel Robert
Loe Stephen R.
McBurney Mark E.
McCarthy Christopher S.
Yee Duke W.
LandOfFree
Attention mechanism for immediately displaying/logging... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Attention mechanism for immediately displaying/logging..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Attention mechanism for immediately displaying/logging... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3100160