Method for software error recovery using consistent global check

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

3951831, 39518215, G06F 1116

Patent

active

056300476

ABSTRACT:
Disclosed is a method for error recovery in a multiprocessing computer system of the type in which each of the processes periodically takes checkpoints. In the event of a failure, a process can be rolled back to a prior checkpoint, and execution can continue from the checkpointed state. A monitor process monitors the execution of the processes. Upon the occurrence of a failure, a target set of checkpoints is identified, and the maximum consistent global checkpoint, which includes the target set of checkpoints, is computed. Each of the processes is rolled back to an associated checkpoint in the consistent global checkpoint. Upon a subsequent occurrence of the same failure, a second set of checkpoints is identified, and the minimum consistent global checkpoint, which includes the target set of checkpoints, is computed. Each of the processes is rolled back to an associated checkpoint in the consistent global checkpoint. Upon another occurrence of the same failure, the system is rolled back further to a coordinated checkpoint. Also disclosed are novel methods for calculating the minimum and maximum consistent global checkpoints. In accordance with one embodiment, the minimum and maximum consistent global checkpoints are calculated by a central process. In accordance with another embodiment, the minimum and maximum consistent global checkpoints are calculated in a distributed fashion by each of the individual processes.

REFERENCES:
patent: 4665520 (1987-05-01), Strom et al.
patent: 4697266 (1987-09-01), Finley
patent: 4740969 (1988-04-01), Fremont
patent: 4852092 (1989-07-01), Makita
patent: 5204960 (1993-04-01), Smiths et al.
patent: 5235700 (1993-08-01), Alaiwan et al.
patent: 5293613 (1994-03-01), Hayden et al.
patent: 5440726 (1995-08-01), Fuchs et al.
patent: 5481694 (1996-01-01), Chad et al.
patent: 5528750 (1996-06-01), Lubart et al.
patent: 5530802 (1996-06-01), Fuchs et al.
AA: "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems--An Optimistic Approach", B. Bhargava and S. Lian, IEEE Symposium on Reliable Distributed Systems, 1988, pp. 3-12.
"Checkpointing and Rollback-Recovery for Distributed Systems", Richard Koo and Sam Toueg, IEEE Transactions on Software Engineering, vol. SE-13, No. 1, Jan. 1987, pp. 23-31.
"Optimistic Recovery in Distributed Systems", R.E. Strom and S. Yemini, ACM Transactions on Computer Systems, vol. 3, No. 3, Aug. 1985, pp. 204-226.
"Distributed Snapshots: Determining Global States of Distributed Systems", K. M. Chandy and L. Lamport, ACM Transactions on Computer Systems, vol. 3, No. 1, Feb. 1985, pp. 63-75.
"Global Checkpointing for Distributed Programs", Luis M. Silva and J. G. Silva, Proceedings IEEE Symp. Reliable Distributed Syst., 1992, pp. 155-162.
"Efficient Distributed Recovery Using Message Logging", A. P. Sistia and J. L. Welch, Proc. 8th ACM Symposium on Principles of Distributed Computing, 1989, pp. 223-238.
"Checkpointing and Its Applications", Y. Wang , Y. Huang, K. Vo, P. Chung and C. Kintala, Proc. IEEE Fault Tolerant Computing Symp. (FTCS-25), Jun. 1995, pp. 22-31.
"Necessary and Sufficient Conditions for Consistent Global Snapshots", R. H. B. Netzer and J. Xu, IEEE Transactions on Parallel and Distributed Systems, vol. 6, No. 2, Feb. 1995, pp. 165-169.
"Progressive Retry for Software Error Recovery in Distributed Systems", Y. Wang, Y. Huang and W.K. Fuchs, Proc. IEEE Fault-Tolerant Computing Symp., Jun. 1993, pp. 138-144.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for software error recovery using consistent global check does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for software error recovery using consistent global check, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for software error recovery using consistent global check will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-1392688

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.