Method and system for error isolation during PCI bus...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S006130, C711S148000

Reexamination Certificate

active

06574752

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to error analysis in information processing systems. More specifically, it relates to isolation of faulty peripheral component interface (PCI) adapters on a PCI bus during input/output sub-system initialization.
2. Description of the Related Art
When a failure occurs on a PCI bus, after system start-up but before machine check handling has been enabled, it is desirable to automatically determine which adapter is responsible for the fault condition. This procedure is difficult because prior to enabling machine check handling, the error condition will checkstop the system. Since there is no scan out capability on the remote I/O drawers where the PCI devices are located, it is not possible to scan out error registers for interrogation. A conventional service procedure is based on treating every bus adapter as suspect. System configuration is modified to comprise its minimum configuration; and, thereafter each adapter card is sequentially tried until the failure occurs in that configuration.
Such a scheme for recreating an error condition in order to identify the faulty adapter is problematic. The procedure often induces additional errors due to physically plugging and unplugging adapter cards. Further, such a sequential procedure adds considerable time to any error repair scenarios.
Check pointing during system startup to determine faulty components is a procedure known in the art. Typically, in a check point procedure, a periodic copy of a program or the state of a computer system is made so that if a failure occurs, recovery can be initiated from the last saved checkpoint and restarted. This invention uses the concept of checkpoints to save the last known PCI address that was attempted to be accessed during the PCI configuration cycle to identify the probable source of failure. In addition, progress codes are presented by the initial program load read only storage (IPLROS) firmware to indicate the progress of the boot sequence. The progress code will indicate that the PCI bus was being configured and the checkpoint will be used to identify the probable source of the failure.
Commonly assigned co-pending application Ser. No. 08/829,088 entitled “A Method and System for Fault Isolation for PCI Bus Errors” teaches a mechanism for identifying a source of an error condition in the I/O mechanism.
U.S. Pat. No. 5,815,647 to Buckland et al., provides a system which allows a user to identify which of a plurality of feature cards has issued an error signal.
IBM Technical Disclosure Bulletin, Vol. 37, No. 08, page 619, discloses a recursive algorithm for initializing error handling logic for a PCI system.
None of these references provides for saving an address indicator prior to accessing that address.
Thus, it is desirable to have a speedy, certain technique for identifying faulty components which prevent a system from completing system start-up and entering its diagnostic routines.
It is further desirable to isolate and diagnose errors in a manner that eliminates the possible introduction of further error conditions.
BRIEF SUMMARY OF THE INVENTION
The present invention overcomes the shortcomings of the prior art by providing a shared mailbox space in memory for use by a service processor during PCI bus and adapter initialization sequence. The address of an adapter is placed in the shared memory space before an attempt to access that adapter is made. If an error occurs during the access attempt, the service processor retrieves the address saved in the shared mailbox and immediately performs its error isolation procedure for determining the slot at fault. In this way the adapter card causing an I/O subsystem failure, rather than the entire I/O subsystem, may be analyzed.


REFERENCES:
patent: 5603033 (1997-02-01), Joannin
patent: 5689726 (1997-11-01), Lin
patent: 5692219 (1997-11-01), Chan et al.
patent: 5701488 (1997-12-01), Mulchandani et al.
patent: 5712967 (1998-01-01), Grossman et al.
patent: 5768622 (1998-06-01), Lory et al.
patent: 5793987 (1998-08-01), Quackenbush et al.
patent: 5809260 (1998-09-01), Bredin
patent: 5815647 (1998-09-01), Buckland et al.
patent: 5815734 (1998-09-01), Lee et al.
patent: 5819053 (1998-10-01), Goodrum et al.
patent: 5838899 (1998-11-01), Leavitt et al.
patent: 5838932 (1998-11-01), Alzien
patent: 5850562 (1998-12-01), Crump et al.
patent: 5864653 (1999-01-01), Tavallaei et al.
patent: 5996034 (1999-11-01), Carter
patent: 0820021 (1997-07-01), None
patent: 006813 (1994-06-01), None
patent: 7123134 (1995-05-01), None
patent: 0954750 (1997-02-01), None
patent: 1030083 (1997-04-01), None
patent: 9844417 (1998-10-01), None
IBM Technical Disclosure Bulletin, vol. 39, No. 3, Mar. 1996, “Technique for Gaining Indefinite Access to Peripheral Component Interconnect* Bus Resource,” pp. 361-362.
IBM Technical Disclosure Bulletin, vol. 38, No. 8, Aug. 1995, “Manufacturing Test Mode for the Peripheral Component Interconnect Bus,” pp. 57-59.
D.R. Crandall, et al, “Self-Initiating Diagnostic Program Loader from Failed Initial Program Load I/O Device, ” Research Disclosure, Jun. 1991, No. 326, Kenneth Mason Publications Ltd., England, 1 page.
IBM Technical Disclosure Bulletin, vol. 37, No. 8, Aug. 1994, “Method to Initialize the Error Handling Logic of a Peripheral Component Interconnect System,” pp. 619-621.
Lauesen, S., “Debugging Techniques,” Software—Practice and Experience, vol. 9, Issue 1, Jan. 1979, pp. 51-63.
Kanopoulos, M., “Design of a bus-monitor for real-time applications,” Microprocessing & Microprogramming, vol. 24, No. 1-5, pp. 717-721, Aug. 1988.
“Early mode padding for Multifunction Hard Core Macro—using synthesis tools for solving early mode problems in implementation of hard core macro such as interfacing PCI bus,” IBM 40788, Feb. 20, 1998, 1 page.
English Language Abstract downloaded and printed from WPAT database for patent No. SU1083194 dated Dec. 17, 1982.
Siewiorek, D. et al., “C.vmp: the Architecture and Implementation of a Fault Tolerant Multiprocessor,” International Symposium on Fault-tolerant Computing, 7the, Los Angeles, Jun. 28-30, 1977, Proceedings, pp. 37-43.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for error isolation during PCI bus... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for error isolation during PCI bus..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for error isolation during PCI bus... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3161213

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.