Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Patent
1998-06-19
2000-12-12
Beausoliel, Jr., Robert W.
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
714 11, 714 35, 714 34, 714797, G06F 1100
Patent
active
061611964
ABSTRACT:
Fault tolerance is provided in a computing system using a technique referred to as indirect instrumentation. In one embodiment, a number of different copies of a given target program are executed on different machines in the system. Each of the machines includes a controller for controlling the execution of the copy of the target program on that machine. The controllers communicate with a user interface of an instrumentation tool on another machine. A user specifies variables to be monitored, breakpoints, voting and recovery parameters and other information using the user interface of the instrumentation tool, and the tool communicates corresponding commands to each of the controllers for use in executing the copies. A fault is detected in one of the copies by comparing values of a user-specified variable generated by the different copies at the designated breakpoints. Upon detection of a fault in a given one of the copies, a checkpoint is taken of another one of the copies that has been determined to be operating properly, and a new copy is restarted from the checkpoint. The use of the controllers allows faults to be detected and appropriate recovery actions to be taken without modification of target program code.
REFERENCES:
patent: 5297274 (1994-03-01), Jackson
patent: 5452441 (1995-09-01), Esposito
patent: 5974565 (1999-10-01), Okuhara
Tsai, Timothy "Fault tolerance via N-Modualr Software Redundancy" IEEE, 1998 p. 201-206.
Fuhrman, Christopher." Hardware/Software Fault Tolerance with Multiple Task Modular Redundancy" IEEE, 1995 p. 171-177.
Bondavalli, A;. "State Restoration in a COTS-based N-Modular Arhitecture" IEEE 1998, pp. 174-183.
J. Long, W. K. Fuchs, and J. A. Abraham, "Forward recovery using checkpointing in parallel systems," Proc. IEEE International Conference on Parallel Processing, pp. 272-275, 1990.
D. K. Pradhan and N. H. Vaidya, "Roll-forward and rollback recovery: Performance-reliability trade-off," Proc. 24th Fault-Tolerant Computing Symposium, pp. 186-195, 1994.
D. K. Pradhan and N. H. Vaidya, "Roll-forward checkpointing scheme: A novel fault-tolerant architecture," IEEE Transactions on Computers, 34(10):1163-1174, Oct. 1994.
Algirdas A. Avizienis, "The Methodology of N-Version Programming," in Michael R. Lyu, editor, Software Fault Tolerance, pp. 23-46, John Wiley & Sons Ltd., 1995.
Yi-Min Wang, et al., "Checkpointing and its applications," in Proc. 25th Fault Tolerant Computing Symposium, pp. 22-31, 1995.
R. A. DeMillo, et al., "An Extended Overview of the Mothra Software Testing Environment," IEEE, pp. 142-151, 1988.
J. Long, et al., "Implementing Forward Recovery Using Checkpoints in Distributed Systems," in J. F. Meyer and R. D. Schlicting, editors, Dependable Computing for Critical Applications 2, pp. 27-46, Springer-Verlag, 1991.
G. A. Kanawati, et al., "FERRARI: A flexible software-based fault and error injection system," IEEE Transactions on Computers, 44(2):248-260, Feb. 1995.
K. H. Huang and J. A. Abraham, "Algorithm-based fault tolerance for matrix operations," IEEE, Transactions on Computers, C-33(6):518-528, Jun. 1984.
V. Balasubramanian and P. Banarjee, "Compiler assisted synthesis of algorithm-based checking in multiprocessors," IEEE Transactions on Computers, 39(4):436-447, Apr. 1990.
T. K. Tsai, et al., "An approach towards benchmarking of fault-tolerant commercial systems," in Proceedings of the 26th International Symposium on Fault-Tolerant Computing, pp. 314-323, Sendai, Japan, Jun. 1996.
Y. Huang and C. Kintala, "Software fault tolerance in the application layer," in Michael Lyu, editor, Software Fault Tolerance, chapter 10, Wiley, 1995.
G. S. Fowler, et al., A user-level replicated file system, in Usenix Conference Proceedings, pp. 279-290, Cincinnati, OH, Summer 1993.
J. H. Barton, et al., "Fault injection experiments using Fiat," IEEE Transactions on Computers, 39(4):575-582, Apr. 1990.
B. J. Choi, et al., "The mothra tools set," in Proceedings of the 22nd Hawaii International Conference on Systems and Software, pp. 275-284, Kona, Hawaii, Jan. 1989.
B.-H. Suh, et al., "FAUST--Fault Injection Based Automated Software Testing," Proceedings of the 1991 Systems Design Synthesis Technology Workshop, Silver Spring, Maryland, pp. 1-11, Jun. 1991.
Beausoliel, Jr. Robert W.
Bonzo Bryce P.
Lucent Technologies - Inc.
LandOfFree
Fault tolerance via N-modular software redundancy using indirect does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Fault tolerance via N-modular software redundancy using indirect, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fault tolerance via N-modular software redundancy using indirect will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-226865