Checkpoint and restoration systems for execution control

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

G06F 1114

Patent

active

060444754

DESCRIPTION:

BRIEF SUMMARY
TECHNICAL FIELD

The present invention relates to a system for checkpointing and restoring the state of a process, and more particularly, to systems for checkpointing and restoring the process state, including lazy checkpoints of the persistent process state, or any specified portion thereof.


BACKGROUND ART

Increasingly, the users of software applications are demanding that the software be resistant, or at least tolerant, to software faults. Users of telecommunication switching systems, for example, demand that the switching systems are continuously available. In addition, where transmissions involve financial transactions, such as for bank automated teller machines, or other sensitive data, customers also demand the highest degree of data consistency.
Thus, a number of software testing and debugging tools have been developed for detecting many programming errors which may cause a fault in a user application process. For example, the Purify.TM. software testing tool, commercially available from Pure Software, Inc., of Sunnyvale, Calif., and described in U.S. Pat. No. 5,193,180, provides a system for detecting memory access errors and memory leaks. The Purify.TM. system monitors the allocation and initialization status for each byte of memory. In addition, for each software instruction that accesses memory, the Purify.TM. system performs a test to ensure that the program is not writing to unallocated memory, and is not reading from uninitialized or unallocated memory.
While software testing and debugging tools, such as the Purify.TM. system, provide an effective basis for detecting many programming errors which may lead to a fault in the user application process, no amount of verification, validation or testing during the software debugging process will detect and eliminate all software faults and give complete confidence in a user application program. Accordingly, residual faults due to untested boundary conditions, unanticipated exceptions and unexpected execution environments have been observed to escape the testing and debugging process and, when triggered during program execution, will manifest themselves and cause the application process to crash or hang, thereby causing service interruption.
It is therefore desirable to provide mechanisms that allow a user application process to recover from a fault with a minimal amount of lost information. Thus, in order to minimize the amount of lost information, a number of checkpointing and restoration techniques have been proposed to recover more efficiently from hardware and software failures. For a general discussion of checkpointing and rollback recovery techniques, see R. Koo and S. Toueg, "Checkpointing and Rollback-Recovery for Distributed Systems," IEEE Trans. Software Eng., Vol. SE-13, No. 1, pp. 23-31 (January 1987). Generally, checkpoint and restoration techniques periodically save the process state during normal execution, and thereafter restore the saved state following a failure. In this manner, the amount of lost work is minimized to progress made by the user application process since the restored checkpoint.
It is noted that the state of a process includes the volatile state as well as the persistent state. The volatile state includes any process information that would normally be lost upon a failure. The persistent state includes all user files that are related to the current execution of the user application process. Although the persistent state is generally not lost upon a failure, it is necessary to restore the persistent state to the same point as the restored volatile state, in order to maintain data consistency.
While existing checkpointing and recovery techniques have adequately addressed checkpointing of the volatile state, these techniques have failed to adequately address checkpointing of the persistent state. According to one approach, all of the persistent state, in other words, all of the user files, are checkpointed with each checkpoint of the volatile state. Clearly, the overhead associated with this technique is prohibitively

REFERENCES:
patent: 4697266 (1987-09-01), Finley
patent: 4814971 (1989-03-01), Thatte
patent: 4819156 (1989-04-01), DeLorme et al.
patent: 4868744 (1989-09-01), Reinsch et al.
patent: 5201044 (1993-04-01), Frey, Jr. et al.
patent: 5235700 (1993-08-01), Alaiwan et al.
patent: 5333303 (1994-07-01), Mohan
patent: 5369757 (1994-11-01), Spiro et al.
patent: 5440726 (1995-08-01), Fuchs et al.
patent: 5530802 (1996-06-01), Fuchs et al.
patent: 5590277 (1996-12-01), Fuchs et al.
Saleh, Kassem et al. "Efficient and Fault-Tolerant Checkpointing Procedures for Distributed Systems," Computers and Communications, 1993 International Phoenix Conference.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Checkpoint and restoration systems for execution control does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Checkpoint and restoration systems for execution control, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Checkpoint and restoration systems for execution control will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-1335729

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.