Computer system error recovery and fault isolation

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S043000

Reexamination Certificate

active

06523140

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to information processing systems and more particularly to a methodology and implementation for processing detected fault conditions in transactions from adapter devices.
BACKGROUND OF THE INVENTION
In all computer systems, devices connected within the system are generally able to communicate and initiate data transfer transactions with other devices in the system as well as with the system memory, system processors and other system central components. These transactions transpire in the form of one or more lines of information being passed from one device in a system to another device in the system. In a specific example, current PCI (peripheral component interconnect) computer systems are able to have many PCI bridge circuits connected between a main system bus and a plurality of PCI busses. Each PCI bus, in turn, may have several adapter devices connected thereto. For large systems, this tree-like configuration can become quite complex and extensive.
In transferring information between system components such as system memory to or from any of the adapter devices, or between any two adapter devices in the computer system, segments or lines of information are placed on system busses between the devices participating in the transaction in a predetermined sequence. The transfer of information from one device to another generally occurs in discrete steps with stops along the way. The information being transferred may, for example, move from one adapter device on one PCI bus to system memory. In an extensive computer system, that journey may pass through several bridge circuits along the way, and the information may be temporarily stored in transit buffers at each of the bridge circuits. Among other things, this step-by-step transaction process allows for a prioritization and/or ordering system in which certain transactions are able to bypass other transactions.
If, however, an error occurs on one of the busses involved in a transaction, it may result in a system error report that is effective to terminate all system operations. For example, in a PCI environment, if a transaction is clear on a primary bus of a bridge, and an error occurs on the secondary bus, then a PCI “SERR” signal is generated which causes a system shut-down rather than risk the propagation of erroneous data caused by the detected error condition.
Thus, all devices in the system as well as the system itself may be totally shut-down because of an easily correctable error condition in only one of the adapter devices in the system.
Thus, there is a need for an improved methodology and implementing system which enables an identification and isolation of specific adapter devices which are detected to have caused detected error conditions in a computer system.
SUMMARY OF THE INVENTION
A method and implementing computer system is provided in which specific device identification information is acquired when a faulty condition is detected during an information transfer transaction, and the condition is reported for corrective action without initiating a system shut-down. In an exemplary PCI system, the PCI adapter sequence information, including tag number, requester bus number, requester device number and requester function number is captured and used in reporting an error condition to the adapter's device driver in order to identify and isolate the adapter in a recovery operation.


REFERENCES:
patent: 5499346 (1996-03-01), Amini et al.
patent: 5790870 (1998-08-01), Hausauer et al.
patent: 5815649 (1998-09-01), Utter et al.
patent: 5838899 (1998-11-01), Leavitt et al.
patent: 5987554 (1999-11-01), Liu et al.
patent: 6182180 (2001-01-01), Liu et al.
patent: 6279125 (2001-08-01), Klein
patent: 6286125 (2001-09-01), Leshay et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Computer system error recovery and fault isolation does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Computer system error recovery and fault isolation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer system error recovery and fault isolation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3120985

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.