Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-10-07
2003-02-18
Le, Dieu-Minh (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S043000
Reexamination Certificate
active
06523140
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to information processing systems and more particularly to a methodology and implementation for processing detected fault conditions in transactions from adapter devices.
BACKGROUND OF THE INVENTION
In all computer systems, devices connected within the system are generally able to communicate and initiate data transfer transactions with other devices in the system as well as with the system memory, system processors and other system central components. These transactions transpire in the form of one or more lines of information being passed from one device in a system to another device in the system. In a specific example, current PCI (peripheral component interconnect) computer systems are able to have many PCI bridge circuits connected between a main system bus and a plurality of PCI busses. Each PCI bus, in turn, may have several adapter devices connected thereto. For large systems, this tree-like configuration can become quite complex and extensive.
In transferring information between system components such as system memory to or from any of the adapter devices, or between any two adapter devices in the computer system, segments or lines of information are placed on system busses between the devices participating in the transaction in a predetermined sequence. The transfer of information from one device to another generally occurs in discrete steps with stops along the way. The information being transferred may, for example, move from one adapter device on one PCI bus to system memory. In an extensive computer system, that journey may pass through several bridge circuits along the way, and the information may be temporarily stored in transit buffers at each of the bridge circuits. Among other things, this step-by-step transaction process allows for a prioritization and/or ordering system in which certain transactions are able to bypass other transactions.
If, however, an error occurs on one of the busses involved in a transaction, it may result in a system error report that is effective to terminate all system operations. For example, in a PCI environment, if a transaction is clear on a primary bus of a bridge, and an error occurs on the secondary bus, then a PCI “SERR” signal is generated which causes a system shut-down rather than risk the propagation of erroneous data caused by the detected error condition.
Thus, all devices in the system as well as the system itself may be totally shut-down because of an easily correctable error condition in only one of the adapter devices in the system.
Thus, there is a need for an improved methodology and implementing system which enables an identification and isolation of specific adapter devices which are detected to have caused detected error conditions in a computer system.
SUMMARY OF THE INVENTION
A method and implementing computer system is provided in which specific device identification information is acquired when a faulty condition is detected during an information transfer transaction, and the condition is reported for corrective action without initiating a system shut-down. In an exemplary PCI system, the PCI adapter sequence information, including tag number, requester bus number, requester device number and requester function number is captured and used in reporting an error condition to the adapter's device driver in order to identify and isolate the adapter in a recovery operation.
REFERENCES:
patent: 5499346 (1996-03-01), Amini et al.
patent: 5790870 (1998-08-01), Hausauer et al.
patent: 5815649 (1998-09-01), Utter et al.
patent: 5838899 (1998-11-01), Leavitt et al.
patent: 5987554 (1999-11-01), Liu et al.
patent: 6182180 (2001-01-01), Liu et al.
patent: 6279125 (2001-08-01), Klein
patent: 6286125 (2001-09-01), Leshay et al.
Arndt Richard Louis
Neal Danny Marvin
Thurber Steven Mark
Le Dieu-Minh
McBurney Mark E.
Wilder Robert V.
LandOfFree
Computer system error recovery and fault isolation does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer system error recovery and fault isolation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer system error recovery and fault isolation will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3120985