Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-08-18
2001-02-13
Beausoliel, Jr., Robert W. (Department: 2785)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S006130
Reexamination Certificate
active
06189117
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method and system for handling errors in a system managed by a processor and, in particular, a system for handling errors in a bridge system interfacing the processor with an external device, such as a computer system.
2. Description of the Related Art
The Peripheral Component Interconnect (PCI) bus is a high-performance expansion bus architecture that was designed to replace the traditional ISA (Industry Standard Architecture) bus. A processor bus master communicates with the PCI local bus and devices connected thereto via a PCI Bridge. This bridge provides a low latency path through which the processor may directly access PCI devices mapped anywhere in the memory or I/O address space. The bridge may optionally include such functions as data buffering/posting and PCI central functions such as arbitration. The architecture and operation of the PCI local bus is described in “PCI Local Bus Specification,” Revisions 2.0 (April, 1993) and Revision 2.1s, published by the PCI Special Interest Group, 5200 Elam Young Parkway, Hillsboro, Oreg., which publication is incorporated herein by reference in its entirety.
A PCI to PCI bridge provides a connection path between two independent PCI local busses. The primary function of the bridge is to allow transactions between a master on one PCI bus and a target device on another PCI bus. The PCI Special Interest Group has published a specification on the architecture of a PCI to PCI bridge in “PCI to PCI Bridge Architecture Specification,” Revision 1.0 (Apr. 10, 1994), which publication is incorporated herein by reference in its entirety. This specification defines the following terms and definitions:
initiating bus—the master of a transaction that crosses a PCI to PCI bridge is said to reside on the initiating bus.
target bus—the target of a transaction that crosses a PCI to PCI bridge is said to reside on the target bus.
primary interface—the PCI interface of the PCI to PCI bridge that is connected to the PCI bus closest to the CPU is referred to as the primary PCI interface.
secondary interface—the PCI interface of the PCI to PCI bridge that is connected to the PCI bus farthest from the CPU is referred to as the secondary PCI interface.
downstream—transactions that are forwarded from the primary interface to the secondary interface of a PCI to PCI bridge are said to be flowing downstream.
upstream—transactions forwarded from the secondary interface to the primary interface of a PCI to PCI bridge are said to be flowing upstream.
The PCI architecture provides for the detection and signaling of both parity and other system errors. The error reporting chain from target to bus master to device driver and eventually to the operating system is intended to allow error recovery operations to be implemented at any level. The generation of the SERR signal could generate an NMI, high priority interrupt signal. The SERR signal is generally used to signal address parity errors and/or other non-parity errors. Any PCI agent can set the SERR error by setting a bit in the configuration space register, such as the Status register.
The PCI bridge must detect address parity errors for all transactions on either a primary or secondary interface. The PCI bridge reports the error by asserting the SERR signal and propagating the SERR signal upstream. For instance, if the bridge detects an address parity error on the primary or secondary interface, the bridge asserts the SERR signal on the primary interface, sets the SERR bit in the Status register, sets a Detected Parity Error bit in either the Status register or Secondary Status register and may signal a target abort by setting a target abort signal register. Another error is the PERR or parity error that the PCI bridge uses to signal a data parity error.
The agent detecting an error may also terminate with a master abort mode by setting a master abort bit. When a read transaction with an address parity error crosses a PCI to PCI bridge and is terminated by a master abort, the bridge will return FFFF FFFFh to the initiator and terminate the read transaction on the initiating bus. When a write transaction is terminated with a master abort, the bridge will complete the write transaction on the initiating bus and discard the write data.
In current systems, a processor functions as the master that controls the PCI to PCI bridge system. One problem with current systems is that when the master processor attached to the PCI system receives an SERR, PERR or other error signal, the operating system of the processor enters a machine check handling mode to diagnose and check the error. However, upon entering the machine check handling mode, the processor would hang-up because the machine check handling logic is designed to handle errors in the processor and is typically not capable of diagnosing errors generated from an external system, such as a PCI to PCI bridge network. Because the machine check handling mode for the processor cannot process an error from the external PCI bridge system, the processor system will hang-up and crash. As a result of this crash, data maybe be lost and the system will be down while the processor is rebooting. In large scale systems, such as the IBM 3990 storage controller which manages critical data, rebooting can take up to twenty minutes. The loss of data and down time resulting from having to reboot the system can be especially costly for such storage controllers that manage critical data. Machine check handling for storage controllers is described in IBM publication “ESA/390 Principles of Operation,” document no. SA22-7201-04 (Copyright IBM Corp. 1990, 1991, 1993, 1994, 1996, 1997), which publication is incorporated herein by reference in its entirety.
Moreover, there is typically a delay time from when an error is generated to when the processor interprets the error interrupt to perform error diagnosis and correction operations. During this delay, the processor may be processing numerous input/output (I/O) requests. Such I/O processing could cause further errors and problems to propagate through the PCI to PCI bridge system before the processor proceeds to address the error.
SUMMARY OF THE PREFERRED EMBODIMENTS
To overcome the limitations in the prior art described above, the preferred embodiments disclose a system for handling errors. A system, managed by a processor, processes an error. The system then generates an interrupt to the processor indicating that an error occurred and executes an error mode before the processor interprets the interrupt. As part of the error mode, the system prevents data from transferring between the system and the processor and the system processes a read request from the processor by returning data to the processor that is unrelated to the requested data. The processor would then process the interrupt indicating the error, and execute a diagnostic mode to diagnose the error in the system.
In further embodiments, the system prevents data from transferring between the system and processor by discarding data transferred therebetween. In still further embodiments, the processor, in the diagnostic mode, reads configuration registers in the system to diagnose the error.
In this way, preferred embodiments provide a system for handling errors generated within the system by allowing the processor to continue executing I/O interrupts and other tasks until processing the interrupt generated for the error. Moreover, further embodiments prevent data flowing between the system and the processor to prevent further errors from propagating through the system. Still further embodiments provide a diagnostic mode in which the processor diagnoses errors in the system.
REFERENCES:
patent: 5297263 (1994-03-01), Ohtosuka
patent: 5488688 (1996-01-01), Gonzales
patent: 5499346 (1996-03-01), Amini et al.
patent: 5555250 (1996-09-01), Walker et al.
patent: 5666559 (1997-09-01), Wisor et al.
patent: 5758065 (1998-05-01), Reams
patent: 5809260 (1998-09-01), Bredin
patent: 5815647 (1998-09-01), Bu
Batchelor Gary William
Beardsley Brent Cameron
Benhase Michael Thomas
Derenburger Jack Harvey
Jones Carl Evan
Beausoliel, Jr. Robert W.
Bonzo Bryce P.
International Business Machines - Corporation
Konrad Raynes & Victor
Victor David W.
LandOfFree
Error handling between a processor and a system managed by... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Error handling between a processor and a system managed by..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Error handling between a processor and a system managed by... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2592098