Restoring checkpointed processes without restoring...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S228000, C707S793000

Reexamination Certificate

active

06256751

ABSTRACT:

TECHNICAL FIELD
This invention relates, in general, to restoring checkpointed processes and, in particular, to leaving attributes of external data referenced by the checkpointed processes unrestored when the checkpointed processes are restored.
BACKGROUND ART
A requirement of any robust computing environment is to be able to recover from errors, such as device hardware errors (e.g., mechanical or electrical errors) or recording media errors. In order to recover from some device or media errors, it is necessary to restart a process, either from the beginning or from some other point within the process.
To facilitate recovery of a process, especially a long running process, intermediate results of the process are taken at particular intervals. This is referred to as checkpointing the process. Checkpointing enables the process to be restarted from the last checkpoint, rather than from the beginning of the process.
When a process is restarted, it is restored to the state it was in when the checkpoint was taken. Thus, any and all changes subsequent to the last checkpoint are undone. This includes any changes that have been made to attributes of external data, such as external functions and/or external variables, referenced by the process. Once the process is restored to its former state, it continues to execute from that point.
Based on the foregoing, a need exists for a restore capability that does not require that all information be restored to the point at which the checkpoint was taken. That is, a need exists for a capability that allows selected information to remain unrestored. In particular, a need exists for a restore capability that enables attributes of external data to remain unrestored, even though other components of the process are being restored.
SUMMARY OF THE INVENTION
The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method of restoring checkpointed processes that have references to external data. The method includes, for instance, restarting a process on a computing unit from a checkpoint taken of the process, wherein the process includes a reference to an external datum. Further, the method includes restoring the process using information obtained from the checkpoint. The restoring leaves one or more attributes of the external datum unrestored.
In one example, the external datum includes one of an external function and an external variable.
Further, the process is restarted on a computing unit, which is different from the computing unit used to take the checkpoint. In another example, the process is restarted on the same computing unit used to take the checkpoint.
In yet another embodiment of the present invention, the one or more attributes are located in a data section of the process, and the restoring includes restoring the data section except for the one or more attributes stored therein.
In one example, the restoring of the data section includes, for instance, obtaining a beginning address of a table of contents of the data section, wherein the table of contents includes the one or more attributes; obtaining an ending address of the table of contents; restoring information located in the data section before the beginning address of the table of contents, if any; and restoring information located in the data section after the ending address of the table of contents, if any.
In another aspect of the present invention, a system of restoring checkpointed processes that have references to external data is provided. The system includes, for instance, means for restarting a process on a computing unit from a checkpoint taken of the process, wherein the process includes a reference to an external datum; and means for restoring the process using information obtained from the checkpoint, wherein the means for restoring leaves one or more attributes of the external datum unrestored.
In yet a further aspect of the present invention, a system of restoring checkpointed processes that have references to external data is provided. The system includes a computing unit adapted to restart a process from a checkpoint taken of the process, wherein the process includes a reference to an external datum. The computing unit is further adapted to restore the process using information obtained from the checkpoint, wherein one or more attributes of the external datum are left unrestored.
In a further aspect of the present invention, an article of manufacture, including at least one computer usable medium having computer readable program code means embodied therein for causing the restoring of checkpointed processes that have references to external data, is provided. The computer readable program code means in the article of manufacture includes, for instance, computer readable program code means for causing a computer to restart a process on a computing unit from a checkpoint taken of the process, wherein the process includes a reference to an external datum; and computer readable program code means for causing a computer to restore the process using information obtained from the checkpoint, wherein the computer readable program code means for causing a computer to restore leaves one or more attributes of the external datum unrestored.
The capabilities of the present invention advantageously enable attributes of external data to remain unrestored, even though a process is being restarted from a checkpoint. This allows a process to be restored and to still reflect some aspects of the current operating environment. Thus, greater flexibility is provided for the restarted process. For example, a process can be restarted on a different computing unit than the one used to take the checkpoint, and the new computing unit is reflected in the process, rather than the old computing unit.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.


REFERENCES:
patent: 4703481 (1987-10-01), Fremont
patent: 5301309 (1994-04-01), Sugano
patent: 5327551 (1994-07-01), Kaneko
patent: 5551043 (1996-08-01), Crump et al.
patent: 5608704 (1997-03-01), Kim
patent: 5630047 (1997-05-01), Wang
patent: 5644742 (1997-07-01), Shen et al.
patent: 5659721 (1997-08-01), Shen et al.
patent: 5659762 (1997-08-01), Sawada et al.
patent: 5712971 (1998-01-01), Stanfill et al.
patent: 5907673 (1999-05-01), Hirayama et al.
patent: 5958070 (1999-09-01), Stiffler
Checkpoint and Migration of Unix Processes in the Condor Distributed Processing System, Todd Tannenbaum, Michael Litskow, Dr. Dobbs Journal, 227:40-48, Feb. 1995.
Supporting Checkpointing and Process Migration Outside the Unix Kernel, M. Litskow, M. Solomon, Proceedings of Usenix Winter 1992 Conference, Jan. 1992.
Libckpt: Transparent Checkpointing Under Unix, James S. Plank, Micah Beck, Gerry Kingsley, Kai Li, Usenix Winter 1995 Technical Conference, Jan. 1995.
An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance, James S. Plank, Technical Report UT-CS-97-372, University of Tennessee, Jul. 1997.
Efficient Checkpointing on MIMD Architectures, James Steven Plank, PhD Dissertation, Princeton University, Jun. 1993.
Checkpointing Distributed Shared Memory, Luis M. Silva, Joao Gabriel Silva, The Journal of Supercomputing, 11:137-158 (1997).
A Checkpointing Strategy for Scalable Recovery on Distributed Parallel Systems, Vijay K. Naik, Samuel P. Midkiff, Jose E. Moreira, IBM Research Report, Jun. 23, 1997.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Restoring checkpointed processes without restoring... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Restoring checkpointed processes without restoring..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Restoring checkpointed processes without restoring... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2556473

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.