Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2001-03-01
2004-09-14
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S031000
Reexamination Certificate
active
06792564
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates generally to data processing systems, and more particularly to logically partitioned multiprocessing systems. Still more particularly, the present invention relates to a standardized format for reporting error events which occur within multiple, different operating systems included within a logically partitioned multiprocessing system.
2. Description of Related Art
Logical partitioning is the ability to make a single multiprocessing system run as if it were two or more independent systems. Each logical partition represents a division of resources in the system and operates as an independent logical system. Each partition is logical because the division of resources may be physical or virtual. An example of logical partitions is the partitioning of a multiprocessor computer system into multiple independent servers, each with its own processors, main storage, and I/O devices. One of multiple different operating systems, such as AIX, LINUX, or others, can be running in each partition.
In a Logically Partitioned (LPAR) multiprocessing system, there are a class of errors (Local) that are only reported to the assigned or owning partition's operating system. Failures of I/O adapters which are only assigned to a single partition's operating system are an example of this. There is also another class of errors (Global) that are reported to each partition's operating system because they could potentially affect each partition's operation. Examples of this type are power supply, fan, memory, and processor failures.
When a serviceable event occurs within one of the logical partitions or is reported to the operating system in the partition, the operating system being executed by that logical partition will execute a diagnostic routine to gather information about the event.
Each operating system will likely have different diagnostic capabilities and different formats for reporting error events. In systems having logical partitioning, and thus which support different operating systems, error events will be reported in a variety of different formats. This can cause a problem for a service technician called to repair the error by creating confusion for the service technician.
Therefore, a need exists for a method, system, and product for providing a standardized format for reporting error events by any of multiple, different operating systems capable of being executed by a logically partitioned multiprocessing system.
SUMMARY OF THE INVENTION
A method, system, and product in a computer system are described for reporting error events which occur within the computer system. The computer system includes multiple logical partitions. Each of the logical partitions may include a different one of multiple, different operating systems. A format is specified for reporting error events. An error event occurring within one of the logical partitions is detected. Information about the error event is formatted according to the specified format. Each operating system utilizes this format to report error events.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.
REFERENCES:
patent: 5021949 (1991-06-01), Morten et al.
patent: 5335341 (1994-08-01), Chana
patent: 5682470 (1997-10-01), Dwork et al.
patent: 5724516 (1998-03-01), Temoshenko
patent: 5860115 (1999-01-01), Neuhard et al.
patent: 5928328 (1999-07-01), Komori et al.
patent: 6263457 (2001-07-01), Anderson et al.
patent: 6601190 (2003-07-01), Meyer et al.
patent: 6618823 (2003-09-01), West
patent: 6643802 (2003-11-01), Frost et al.
patent: 2001/0013108 (2001-08-01), Sturm et al.
Birman et al., “Reliability Through Consistency”, 1995, IEEE Software, pp. 29-41.*
Sens et al., “STAR: a Fault-Tolerant System for Distributed Applications”, 1993, IEEE, pp. 656-660.*
“OGR.h”, May 2, 2000, Distributed.net [http://http.distributed.net/pub/dcti/source/archives/pub-20000502.tar.gz].
Ahrens, Jr. George Henry
Benignus Douglas Marvin
Mooney Leo C.
Tysor Arthur James
Beausoliel Robert
Chu Gabriel
International Business Machines - Corporation
McBurney Mark E.
Yee Duke W.
LandOfFree
Standardized format for reporting error events occurring... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Standardized format for reporting error events occurring..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Standardized format for reporting error events occurring... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3237659