Patent
1997-03-14
1999-07-13
Palys, Joseph E.
395569, G06F 1100
Patent
active
059238321
ABSTRACT:
A computer system monitors inter-process communications and performs a synchronous (global) checkpointing for processes that belong to a checkpoint group. The system also performs a local checkpointing at respectively arbitrary times within each process. When a fault occurs, a validity of each checkpoint is examined in accordance with monitoring results of inter-process communications related to the fault process. If the most recent checkpoint of the fault process is valid, only the fault process is rolled back. If it is invalid, all processes of the checkpoint group are rolled back to the global checkpoint, or each of the processes are rolled back to each optimum (valid) checkpoint.
REFERENCES:
patent: 4665520 (1987-05-01), Strom et al.
patent: 5271013 (1993-12-01), Gleeson
patent: 5293613 (1994-03-01), Hayden et al.
patent: 5301309 (1994-04-01), Sugano
patent: 5440726 (1995-08-01), Fuchs et al.
patent: 5590277 (1996-12-01), Fuchs et al.
patent: 5630047 (1997-05-01), Wang
patent: 5664088 (1997-09-01), Romanovsky et al.
patent: 5734817 (1998-03-01), Roffe et al.
patent: 5802267 (1998-09-01), Shirakihara
Cristian et al., "A Timestamp Based Checkpointing Protocol for Long Lived Distributed Computations", Reliable Distributed Systems, 10th Symposium, IEEE, pp. 12-20, 1991.
Silva et al., "Global Checkpointing for Distributed Prgrams", Reliable Distributed systems, 11th Symposium, IEEE, pp. 155-162, 1992.
Silva et al., "On The Optimum Recoevry of Distributed Programs", Euromicro, System Architecture and Design, 20th Conference, IEEE, pp. 704-711, 1994.
Netzer et al., "Necessary and Sufficient Conditions for Consitent Global Snapshots", IEEE Trans. on Parallel and Distributed Sysstems, vol. 6, No. 2 pp. 165-169, 1995.
Chandy, K. Mani et al., "Distributed Snapshots: Determining Global States of Distributed Systems," ACM Transactions on Computer Systems, vol. 3, No. 1, Feb. 1985, pp. 63-75.
Plank, James S. et al., "ickp: A Consistent Checkpointer for Multicomputers," IEEE Parallel & Distributed Technology, pp. 62-67.
Strom, Robert E. et al., "Optimistic Recovery in Distributed Systems," ACM Transactions on Computer Systems, vol. 3, No. 3, Aug. 1985, pp. 204-226.
Kanai Tatsunori
Kizu Toshiki
Shirakihara Toshio
Kabushiki Kaisha Toshiba
Palys Joseph E.
LandOfFree
Method and apparatus for checkpointing in computer system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for checkpointing in computer system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for checkpointing in computer system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2285112