Method of checkpointing parallel processes in execution...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S228000

Reexamination Certificate

active

07437606

ABSTRACT:
An embodiment of a method of checkpointing parallel processes in execution within a plurality of process domains begins with a step of setting communication rules to stop communication between the process domains. Each process domain comprises an execution environment at a user level for at least one of the parallel processes. The method continues with a step of checkpointing each process domain and any in-transit messages. The method concludes with a step of resetting the communication rules to allow the communication between the process domains.

REFERENCES:
patent: 6338147 (2002-01-01), Meth et al.
patent: 6393583 (2002-05-01), Meth et al.
patent: 7275183 (2007-09-01), Santos et al.
patent: 2002/0087916 (2002-07-01), Meth
Bouteiller, A., et al., Coordinated checkpoint versus message log for fault tolerant MPI, Dec. 2003.
Duell, J., The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart, 2003.
Litzkow, M., et al., Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System, 1997.
Osman, S., et al., The Design and Implementation of Zap: A System for Migrating Computing Environments, Proc. OSDI 2002, Dec. 2002.
Plank, J.S. et al. Libckpt: Transparent Checkpointing under Unix, < http://www.cs.utk.edu/plank/plank/papers/USENIX-95W.html> , 1995.
Plank, J.S., An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance, Tech. Report UT-CS-97-372, Univ. of Tenn. Knoxville, Tenn., Jul. 1997.
Stellner, G., CoCheck: Checkpointing and Process Migration for MPI, 1996.
Youhui, Z., etal., Checkpointing and Migration of parallel processes based on Message Passing Interface, Oct. 2002.
Zhong, H., et al., CRAK: Linux Checkpoint/Restart As a Kernel Module, Technical Report CUCS-014-01, < http://www.ncl.cs.columbia/research/migrate/crak.html> , Nov. 2001.
E. N. (Mootaz) Elnozahy et al., A Survey of Rollback-Recovery Protocols in Message-Passing Systems, Sep. 2002, ACM Computing Surveys, 34(3):375-408, and bibliography 1-10, ACM Press, New York, NY.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of checkpointing parallel processes in execution... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of checkpointing parallel processes in execution..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of checkpointing parallel processes in execution... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3993860

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.