Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1997-03-31
2003-04-29
Wong, Peter (Department: 2181)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S043000, C714S006130
Reexamination Certificate
active
06557121
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to input/output operations in a computer system, and more particularly to fault isolation in a peripheral component interconnect (PCI) structure.
BACKGROUND OF THE INVENTION
In many computer systems, support of peripheral devices, such as hard disk drives, speakers, CD-ROM drives, etc., occurs through a standard I/O (input/output) device architecture called Peripheral Component Interconnect (PCI). The PCI architecture supports many complex features, including I/O expansion through PCI-to-PCI bridges, peer-to-peer (device-to-device) data transfers between controlling devices, i.e., masters, and responding devices, i.e., targets, as well as multi-function devices, and both integrated and plug-in devices.
The PCI architecture also defines standards for the detection and capture of error conditions on a PCI bus and in the devices. While the standard facilities provide error capture capabilities, the number of failure scenarios that may occur is large given the wide range of features allowed by the PCI architecture. Thus, isolating faults to a specific failing component becomes very difficult.
For example, for each transaction that occurs on the PCI bus, there is a master device which controls the transaction, and a target device which responds to the master's request. Since data can flow in either direction (i.e., the master can request a read or write), it is important to know which device was the sender of bad data and which device was the receiver. Also, since errors can flow across PCI-to-PCI bridges, it is important to know whether the fault is located on the near or far side of the bridge.
Accordingly, a need exists for a failure isolation technique that would operate successfully for the numerous options supported by the PCI architecture, while providing consistent diagnostic information to servicers across a wide variety of hardware platforms.
SUMMARY OF THE INVENTION
The present invention meets this need and provides method and system aspects for fault isolation on a PCI bus. In a method aspect, a method for isolating a fault condition on a bus of a computer system, the computer system including an input/output (I/O) subsystem formed by a plurality of I/O devices communicating via the bus, includes categorizing, in a recursive manner, the I/O subsystem, and isolating a source of an error condition within the I/O subsystem. Further, the I/O subsystem communicates via a peripheral component interconnect, PCI, bus.
In a further method aspect, a method for fault isolation for bus errors includes the steps of (a) processing a device error on a PCI bus, and (b) performing ordered categorization of a plurality of input/output devices coupled to the PCI bus. The method further includes (c) determining whether the device error originates from a subordinate branch of the PCI bus, and (d) recursively performing steps (a)-(c) until the PCI bus is categorized.
In a system aspect, a computer system for isolating a fault condition on a bus includes a processing mechanism, and an input/output mechanism coupled to the processing mechanism. The input/output mechanism comprises a plurality of input/output devices and bridges coupled to a PCI bus and communicating according to a PCI standard. In addition, the system includes a fault isolation mechanism within the processing mechanism for identifying a source of an error condition in the input/output mechanism. Further, the fault isolation mechanism performs categorization of the input/output mechanism in a recursive manner.
With the present invention, a fault isolation technique successfully provides more specific identification of an error source in a PCI bus architecture. The fault isolation technique greatly reduces the ambiguity of error occurrence when the numerous options supported by the PCI architecture are utilized in a given system. Further, by relying on the standard features of the PCI architecture, the fault isolation technique is readily applicable to varying system arrangements to provide versatile application. These and other advantages of the aspects of the present invention will be more fully understood in conjunction with the following detailed description and accompanying drawings.
REFERENCES:
patent: 4044337 (1977-08-01), Hicks et al.
patent: 4095268 (1978-06-01), Kobayashi et al.
patent: 4215397 (1980-07-01), Hom
patent: 4360917 (1982-11-01), Sindelar et al.
patent: 4511982 (1985-04-01), Kurakake
patent: 4604746 (1986-08-01), Blum
patent: 4965717 (1990-10-01), Cutts, Jr. et al.
patent: 5142165 (1992-08-01), Allard et al.
patent: 5193181 (1993-03-01), Barlow et al.
patent: 5245615 (1993-09-01), Treu
patent: 5249187 (1993-09-01), Bruckert et al.
patent: 5251227 (1993-10-01), Bruckert et al.
patent: 5267246 (1993-11-01), Huang et al.
patent: 5291600 (1994-03-01), Lutz et al.
patent: 5307482 (1994-04-01), Bealkowski et al.
patent: 5313625 (1994-05-01), Hess et al.
patent: 5313628 (1994-05-01), Mendelsohn et al.
patent: 5317752 (1994-05-01), Jewett et al.
patent: 5375219 (1994-12-01), Okabe
patent: 5390324 (1995-02-01), Burckhartt et al.
patent: 5410706 (1995-04-01), Farrand et al.
patent: 5421006 (1995-05-01), Jablon et al.
patent: 5437047 (1995-07-01), Nakamura
patent: 5442777 (1995-08-01), Nakajima et al.
patent: 5444859 (1995-08-01), Baker et al.
patent: 5450579 (1995-09-01), Johnson
patent: 5455933 (1995-10-01), Schieve et al.
patent: 5467449 (1995-11-01), Gauronski et al.
patent: 5471674 (1995-11-01), Stewart et al.
patent: 5475839 (1995-12-01), Watson et al.
patent: 5487148 (1996-01-01), Komori et al.
patent: 5488688 (1996-01-01), Gonzales et al.
patent: 5499346 (1996-03-01), Amini et al.
patent: 5530847 (1996-06-01), Schieve et al.
patent: 5530946 (1996-06-01), Bouvier et al.
patent: 5557547 (1996-09-01), Phaal
patent: 5560018 (1996-09-01), Macon, Jr. et al.
patent: 5560033 (1996-09-01), Doherty et al.
patent: 5564054 (1996-10-01), Bramnick et al.
patent: 5619644 (1997-04-01), Crockett et al.
patent: 5680537 (1997-10-01), Byers et al.
patent: 5701409 (1997-12-01), Gates et al.
patent: 5712967 (1998-01-01), Grossman et al.
patent: 5742851 (1998-04-01), Sekine
patent: 5768496 (1998-06-01), Lidgett et al.
patent: 5768612 (1998-06-01), Nelson
patent: 5777549 (1998-07-01), Arrowsmith et al.
patent: 5784617 (1998-07-01), Greenstein et al.
patent: 5790870 (1998-08-01), Hausauner et al.
patent: 5805785 (1998-09-01), Dias et al.
patent: 811929 (1997-12-01), None
patent: WO9700480 (1997-01-01), None
“Method to Initialize the Error Handling Logic of a Peripheral Component Interconnect System,”IBM Technical Disclosure Bulletin, vol. 37, No. 8, Aug. 1994.
IBM Technical Disclosure Bulletin, “Recovery from Single Critical Hardware Resource Unavailability”, vol. 36, No. 08, Aug. 1993.
IBM Technical Disclosure Bulletin, “Programmed Clock Synchronization in A Skewed Clock Environment”, vol. 26, No. 8, Jan. 1984.
Kitamorn Alongkorn
McLaughlin Charles Andrew
International Business Machines - Corporation
Phan Raymond N
Sawyer Law Group LLP
Wong Peter
LandOfFree
Method and system for fault isolation for PCI bus errors does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for fault isolation for PCI bus errors, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for fault isolation for PCI bus errors will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3074426