Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2004-08-23
2008-10-14
Wilson, Yolanda L (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C712S228000
Reexamination Certificate
active
07437606
ABSTRACT:
An embodiment of a method of checkpointing parallel processes in execution within a plurality of process domains begins with a step of setting communication rules to stop communication between the process domains. Each process domain comprises an execution environment at a user level for at least one of the parallel processes. The method continues with a step of checkpointing each process domain and any in-transit messages. The method concludes with a step of resetting the communication rules to allow the communication between the process domains.
REFERENCES:
patent: 6338147 (2002-01-01), Meth et al.
patent: 6393583 (2002-05-01), Meth et al.
patent: 7275183 (2007-09-01), Santos et al.
patent: 2002/0087916 (2002-07-01), Meth
Bouteiller, A., et al., Coordinated checkpoint versus message log for fault tolerant MPI, Dec. 2003.
Duell, J., The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart, 2003.
Litzkow, M., et al., Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System, 1997.
Osman, S., et al., The Design and Implementation of Zap: A System for Migrating Computing Environments, Proc. OSDI 2002, Dec. 2002.
Plank, J.S. et al. Libckpt: Transparent Checkpointing under Unix, < http://www.cs.utk.edu/plank/plank/papers/USENIX-95W.html> , 1995.
Plank, J.S., An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance, Tech. Report UT-CS-97-372, Univ. of Tenn. Knoxville, Tenn., Jul. 1997.
Stellner, G., CoCheck: Checkpointing and Process Migration for MPI, 1996.
Youhui, Z., etal., Checkpointing and Migration of parallel processes based on Message Passing Interface, Oct. 2002.
Zhong, H., et al., CRAK: Linux Checkpoint/Restart As a Kernel Module, Technical Report CUCS-014-01, < http://www.ncl.cs.columbia/research/migrate/crak.html> , Nov. 2001.
E. N. (Mootaz) Elnozahy et al., A Survey of Rollback-Recovery Protocols in Message-Passing Systems, Sep. 2002, ACM Computing Surveys, 34(3):375-408, and bibliography 1-10, ACM Press, New York, NY.
Janakiraman Gopalakrishnan
Santos Jose Renato
Subhraveti Dinesh Kumar
Turner Yoshio Frank
Hewlett--Packard Development Company, L.P.
Wilson Yolanda L
LandOfFree
Method of checkpointing parallel processes in execution... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of checkpointing parallel processes in execution..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of checkpointing parallel processes in execution... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3993860