Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-06-24
2001-09-11
Hua, Ly V. (Department: 2785)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S013000
Reexamination Certificate
active
06289474
ABSTRACT:
BACKGROUND
A common problem in computer systems, particularly transaction-based computer systems operating on a database, is providing some form of tolerance or resilience to failures that may occur during processing. Such tolerance typically is provided by checkpointing and redundancy. Checkpointing typically involves periodically saving the processing state of a machine and, after detection of a failure, restoring the state of the computer to a previously saved internally consistent processing state. Computer systems that provide checkpointing and redundancy typically use specially designed hardware and/or operating systems, or require an application programmer to create appropriate checkpoints.
The complexities of providing a checkpointing facility are increased in dataflow and parallel computer systems, particularly dataflow systems used on parallel databases, and the Orchestrate application environment from Torrent Systems, Inc., and other similar products. Some of these problems are explored, in part, in “Loading Databases Using Dataflow Parallelism,”
SIGMOD Record
, Volume 23, Number 4, pages 72-87, December 1994.
SUMMARY
Checkpointing of operations on data may be provided by partitioning the data into temporal segments. Operations may be performed on the temporal segments and checkpoints may be established by storing a persistent indication of the segment being processed. The entire processing state need not be saved. If a failure occurs, processing can be restarted using the saved indication of the segment to be processed. Such data partitioning and checkpointing may be applied to relational databases, databases with dataflow operation and/or parallelism and other database types with or without parallel operation.
Accordingly, in one aspect, checkpointing operations on data by a processing element in a computer system involves partitioning the data into temporal segments for processing by the processing element. One of the temporal segments is selected. A persistent indication of the selected temporal segment is saved. The selected temporal segment is processed by the processing element. When a failure of the processing element is detected, any outputs generated by the processing element for the selected temporal segment are discarded and the selected temporal segment corresponding to the saved persistent indication is reprocessed. When processing by the processing element completes without failure, the outputs produced by the processing element may be saved. The next temporal segment to be processed by the processing element is then selected. When the data is retrieved from a relational database using a query, the retrieved data may be stored in persistent storage. In such a case, the data partitioned into the temporal segment is the data stored in persistent storage.
REFERENCES:
patent: 5363503 (1994-11-01), Gleeson
patent: 5819021 (1998-10-01), Stanfill et al.
patent: 5872907 (1999-02-01), Griess et al.
patent: 5909581 (1999-06-01), Park
patent: 6032267 (2000-02-01), Fishler et al.
Jim Gray and Andreas Reuter,Transaction Processiing: Concepts and Techniques(Morgan-Kaufman Publishers, 1993), (pp. 210-217).
Tom Barclay, Robert Barnes, Jim Gray, and Prakash Sundaresan, “Loading Databases Using Dataflow Parallelism ,”In Sigmod Record, vol. 23, No. 4, Dec. 1994.
R. Koo and S. Toueg, “Checkpointing and rollback-recovery for distrubuted systems,”IEEE Trans. Software Eng., SE-13-23-31, Jan. 1987.
Y. Tamir and C. H. Sequin, “Error recovery in multicomputers using global checkpoints,”Proc. 13th Int. Conf. Parallel Processing, pp. 32-41, Aug. 1984.
Y. M. Wang and W. K. Fuchs “Lazy checkpoint coordination for bounding rollback propagation,” inProc. 12th Symp. On Reliable Distrubuted Sys., pp. 78-85, Oct. 1993.
K. Li, J. F. Naughton, and J. S. Plank,“Checkpoint multicomputer applications,” inProc. 10th IEEE Symp. Reliable Distributed Sys. pp. 2-11, Oct. 1991.
Hua Ly V.
Torrent Systems, Inc.
Wolf Greenfield & Sacks P.C.
LandOfFree
Computer system and process for checkpointing operations on... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer system and process for checkpointing operations on..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer system and process for checkpointing operations on... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2487244