Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Patent
1998-03-18
2000-12-12
Beausoliel, Jr., Robert W.
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
G06F 1100
Patent
active
06161193&
ABSTRACT:
A distributed computing system includes a number of computers, workstations or other computing machines interconnected by a network. A non-interactive process arriving in a host machine of the system is migrated for execution to at least two remote machines. For example, first and second executions of the process may be started on respective first and second remote machines. One of the first and second executions of the process is then used to provide an on-demand checkpoint for the other execution of the process in the event the other execution is terminated, such that an additional execution of the process can be started from the on-demand checkpoint. This on-demand checkpointing is augmented with periodic checkpointing performed on at least one of the multiple executions of the process. The period of the periodic checkpointing for a given execution of the process may be fixed without regard to the status of the on-demand checkpointing for that execution, or alternatively may be reset each time an on-demand checkpoint is taken for that execution.
REFERENCES:
patent: 5473771 (1995-12-01), Burd et al.
patent: 5675807 (1997-10-01), Iswandhi et al.
patent: 5751932 (1998-05-01), Horst et al.
patent: 5987432 (1999-11-01), Zusman et al.
Roll-forward and roolback recovery, Pradhan et al. p. 186-195, IEEE 1994.
D.K. Pradhan and N.H. Vaidya, "Roll-Forward and Rollback Recovery: Performance-Reliability Trade-Off," Proc. Fault-Tolerant Computing Symposium, pp. 186-195, 1994.
D.K. Pradhan and N.H. Vaidya, "Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture," IEEE Transactions on Computers, vol. 43, No. 10, pp. 1163-1174, Oct. 1994.
A. Duda, "The Effects of Checkpointing on Program Execution Time," Information Processing Letters, 16 (1983), pp. 221-229, Jun. 1983.
R. Shridhar, "Hydra: A Novel Mechanism for Process Migration in Distributed Systems Using Process Replication," Thesis, Graduate School of Engineering, Northeastern University, Boston, MA, 1996.
Garg Sachin
Huang Yennun
Rangarajan Sampath
Beausoliel, Jr. Robert W.
Elisca Pierre E
Lucent Technologies - Inc.
LandOfFree
Methods and apparatus for process replication/recovery in a dist does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for process replication/recovery in a dist, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for process replication/recovery in a dist will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-226843