Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-10-05
2004-03-02
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S038110, C717S124000, C717S127000, C717S131000
Reexamination Certificate
active
06701460
ABSTRACT:
BACKGROUND
1. Field of the Invention
The present invention relates to mechanisms for testing computer systems. More specifically, the present invention relates to a method and an apparatus for testing a computing system by injecting faults into the computer system while the computer system is running.
2. Related Art
The need for reliable computing systems has lead to the development of “highly available” computer systems that continue to function when one or more of the subsystems and/or components of a computing system fail.
In order to ensure that highly available computer systems operate properly, it is necessary to perform rigorous testing. This testing is complicated by the fact that highly available computer systems typically include a large number of components and subsystems that are subject to failure. Furthermore, an operating system for a highly available computer system contains a large number of pathways to handle error conditions that must also be tested.
Some types of testing can be performed manually, for example by unplugging a computer system component, disconnecting a cable, or by pulling out a computer system board while the computer system is running. However, an outcome of this type of manual testing is typically not repeatable and is imprecise because the manual event can happen at random points in the execution path of a program and/or operating system that is executing on the highly available computer system.
What is needed is a method and an apparatus that facilitates testing a computer system by injecting faults at precise locations in the execution path of an operating system and/or program that is executing on a computer system.
SUMMARY
One embodiment of the present invention provides a system for testing a computer system by using software to inject faults into the computer system while the computer system is operating. This system operates by allowing a programmer to include a fault point into source code for a program. This fault point causes a fault to occur if a trigger associated with the fault point is set and if an execution path of the program passes through the fault point. The system allows this source code to be compiled into executable code. Next, the system allows the computer system to be tested. This testing involves setting the trigger for the fault point, and then executing the executable code, so that the fault occurs if the execution path passes through the fault point. This testing also involves examining the result of the execution.
In one embodiment of the present invention, if the fault point is encountered while executing the executable code, the system executes the fault point by: looking up a trigger associated with the fault point; determining whether the trigger has been set; and executing code associated with the fault point if the trigger has been set.
In one embodiment of the present invention, the fault point calls a fault function that causes the fault to occur.
In one embodiment of the present invention, the fault point includes code that causes the fault to occur.
In one embodiment of the present invention, the trigger has global scope and is stored in a kernel address space of an operating system within the computer system.
In one embodiment of the present invention, the trigger is stored in an environment variable associated a method invocation.
In one embodiment of the present invention, the trigger is stored within an object reference. In a variation on this embodiment, the trigger causes the fault to be generated when the referenced object is invoked.
In one embodiment of the present invention, the fault can include: a computer system reboot operation, a computer system panic operation, a return of an error code, a forced change in control flow, a resource allocation failure, a response delay, and a deadlock.
REFERENCES:
patent: 5265254 (1993-11-01), Blasciak et al.
patent: 5450586 (1995-09-01), Kuzara et al.
patent: 5812828 (1998-09-01), Kaufer et al.
patent: 6139198 (2000-10-01), Danforth et al.
patent: 6282701 (2001-08-01), Wygodny et al.
patent: 6484276 (2002-11-01), Singh et al.
patent: 6490721 (2002-12-01), Gorshkov et al.
Microsoft Computer Dictionary, Microsoft Press, 1997, 3rdEdition, p. 251.*
Fault Injection Mechanism, Oct. 1998, Research Disclosure, Vol 41 Issue 414.*
Publication, entitled “Software Fault Injection and its Application in Distributed Systems,” by Harold A. Rosenberg, et al., IEEE, Jun. 1993, pp. 208-217.
Publication, entitled “FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults,” by Wei-lun Kao, et al., IEEE Transactions on Software Engineering, Nov. 1993, No. 11, New York, US.
Publication, entitled “DEFINE: A Distributed Fault Injection and Monitoring Environment,” by Wei-lun Kao, et al., IEEE 1995, pp. 252-259.
Publication, entitled “Fault-Injection-Based Testing of Fault-Tolerant Algorithms in Message-Passing Parallel Computers,” by Douglas M. Bough, et al., IEEE, Jun. 24, 1997, pp. 258-267.
Suwandi Jongki A. L.
Talluri Madhusudhan
Beausoliel Robert
Park Vaughan & Fleming LLP
Sun Microsystems Inc.
LandOfFree
Method and apparatus for testing a computer system through... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for testing a computer system through..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for testing a computer system through... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3242832