Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-09-29
2001-10-16
Beausoleil, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C710S005000, C711S004000
Reexamination Certificate
active
06304984
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to injecting device errors and in particular to injecting device errors during selected load and store operations. Still more particularly, the present invention relates to preventing a selected load and store operation from getting to a device by detection of which device is the target of the selected load or store operation and injecting specific errors to that particular device in and operating system error recovery code to test the device driver path for those errors.
2. Description of the Related Art
Many data processing or computer systems support a standard input/output (I/O) systems conforming to the peripheral component interconnect (PCI) Local Bus architecture, an architecture supporting many complex features including I/O expansion through PCI-to-PCI bridges, peer-to-peer (device-to-device) data transfers, multi-function devices, and both integrated and plug-in devices. In setting up I/O operations to I/O devices on a PCI bus, the device driver must perform a series of load and/or store operations to the I/O device. If any of these operations gets a parity error on the I/O bus, it is necessary to get this information back to the device driver so that the device driver can stop before the operation is initiated.
As an example, a first store operation may be employed to set up an address in the I/O device, followed by a second store operation signalling the I/O device to begin the data transfer. If the first store operation gets an error and the second store operation is then received, the I/O device might start the operation to the incorrect location. The PCI architecture includes no provision for designing adapters to prevent load and/or store operations from continuing after an error. Most contemporary systems allow device driver execution to continue after a store operation rather than wait for a “successful” response to the store operation to determine if it completes correctly. This is preferable since the processor stall required to wait for a response to store operations would vastly degrade system performance. Currently, I/O adapters have the capability to detect parity errors on the I/O bus and recover from them.
One technique allowing the device driver to prevent subsequent load and/or store operations from completing after an error without waiting for the response to every load or store operation is to have the device select lines from each I/O device be brought into a PCI host bridge individually so that the device number of a failing device may be logged in an error register when an error is seen on the PCI bus. Until the error register is reset, subsequent load and store operations are delayed until the device number of the subject device may be checked against the error register. If the subject device is a previously failing device, the load/store operation to that device is prevented from completing, either by forcing bad parity or zeroing all byte enables. By forcing bad parity or zero byte enables, the I/O device will respond to the load or store request by activating its device select line, but will not accept store data. Operations to devices which are not logged in the error register are permitted to proceed normally, as are all load store operations when the error register is clear. However it is one thing to generate the device driver code to recover from errors and quite another thing to test and debug the code paths, which handle the errors.
In the past, special test I/O adapters have been developed to inject errors onto a bus in order to attempt to test device driver error paths in a development environment. However, these special test adapters have the drawback that they are not shipped with the computer system, and therefore are not available to all device driver writers. Additionally, in order to inject an error, these adapters usually compare on the address of the operation and inject an error after the address has been detected. This error injection technique has the disadvantage in that randomization of errors is not possible and that the I/O adapter has to be set up with an address which will correspond to an address of the device with which to have the error injected upon. Lastly, if multiple devices are to be checked out at the same time, a separate special I/O adapter for each bus in the system is required.
It would be desirable, therefore, to provide a method and system for injecting errors during bus operations in a computer system to a device which does not require a specific address to be set up to correspond to an address of the device which is to have the error injected. It would also be advantageous for the mechanism to provide randomization of errors to be injected while simultaneously not requiring a separate I/O adapter for each bus in a computer system when testing multiple devices on different buses.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide a method and system for injecting errors during load and store operations in a computer system to a selected device.
It is another object of the present invention to provide a method and system which does not require a specific address to be set up to correspond to an address of a selected device to have the error injected upon.
It is yet another object of the present invention to provide a method and system that does not require a separate adapter for each bus in a computer system, therefore testing multiple devices on different buses more easily and with less expense.
The foregoing objects are achieved as is now described. Device select lines from each device in a computer system are brought into a host bridge individually for determining if an error is to be injected to a selected device. The host bridge includes a plurality of pre-defined registers used for injecting errors to a selected device so that other devices are not affected during normal systems operations. First, a register or a bit in a register in the host bridge is matched against an incoming bus operation for the type of bus operation, a load or a store, to inject the error upon. Next, a register having an initial or random value within the host bridge indicates which occurrence of the operation to inject the error. If the value of the register indicates that an error is to be injected, the load or store operation is delayed by forcing zero byte enables until the device identifier of the selected device may be checked against a device register within the host bridge. If the device register indicates the selected device, a type of error indicated by an error register within the host bridge is injected to the selected device and the operation is restarted. Operations to devices, which are not logged in the device register, are permitted to proceed normally, as are bus operations for devices logged in the device register based on the status of the register indicating the occurrence.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.
REFERENCES:
patent: 5001712 (1991-03-01), Splett et al.
patent: 5790870 (1998-08-01), Hausauer et al.
patent: 5850558 (1998-12-01), Qureshi et al.
patent: 5878237 (1999-03-01), Qlarig
patent: 5892964 (1999-04-01), Horan et al.
Neal Danny Marvin
Thurber Steven Mark
Beausoleil Robert
Bracewell & Patterson L.L.P.
Emile Volel
International Business Machines - Corporation
Ziemer Rita
LandOfFree
Method and system for injecting errors to a device within a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for injecting errors to a device within a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for injecting errors to a device within a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2583223