Continuous flow checkpointing data processing

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S019000

Reexamination Certificate

active

06584581

ABSTRACT:

TECHNICAL FIELD
This invention relates to data processing, and more particularly to a system, method, and computer program for continuous flow checkpointing data processing.
BACKGROUND
With the huge popularity of the Internet for data access and electronic commerce comes a need for high-performance, fault tolerant “back-office” processing capabilities that allow large volumes of data to be processed essentially continuously and in near real-time (i.e., responding to a user's input within a few seconds to a few minutes). Such processing capabilities should be robust (i.e., fault tolerant) to allow processing to continue where it left off after a failure. While such capabilities are useful for large-scale Internet-based data processing, they are often also applicable to conventional types of data processing over private networks and communication systems (e.g., airline reservation systems, internal corporate “intranets”, etc.).
Achieving high performance for a particular volume of data often means using a parallel processing system to process the data within a reasonable response time. Numerous examples of parallel processing systems are known. For example,
FIG. 1
is a block diagram of a typical prior art multi-process data processing system
100
. Data from an ultimate source
101
(e.g., a web server) is communicated to at least one data queue
102
. Data is read, or “consumed”, from time to time by an initial process
104
, which outputs processed data to one or more data queues
106
,
106
′. The process
104
typically is a single process that uses a two-phase commit protocol to coordinate consumption of input data and propagation of output data, in known fashion. Subsequent processes
108
,
108
′ may be linked (shown as being in parallel) to provide additional processing and output to subsequent data queues
110
,
110
′. The data is finally output to an ultimate consumer
112
, such as a relational database management system (RDBMS). In practice, such a system may have many processes, and more parallelism than is shown. Further, each process may consume data from multiple data queues, and output data to multiple data queues.
To obtain fault tolerance, such systems have used “checkpointing” techniques that allow a computational system to be “rolled back” to a known, good set of data and machine state. In particular, checkpointing allows the application to be continued from a checkpoint that captures an intermediate state of the computation, rather than re-running the entire application from the beginning. Examples of checkpointing systems are described in U.S. Pat. No. 5,819,021, entitled “Overpositioning System and Method for Increasing Checkpoints in Component-Based Parallel Applications”, and U.S. Pat. No. 5,712,971, entitled “Methods and Systems for Reconstructing the State of a Computation”, both assigned to the assignee of the present invention.
A problem with using traditional checkpointing techniques with data where essentially continuous data processing is desired (e.g., Internet data processing) is that checkpoints may only be created when the system is quiescent, i.e., when no processes are executing. Thus, every process would have to suspend execution for the time required by the process that requires the longest time to save its state. Such suspension may adversely impact continuous processing of data.
Accordingly, the inventors have determined that there is a need to provide for a data processing system and method that provides checkpointing and permits a continuous flow of data processing by allowing each process to return to operation after checkpointing, independently of the time required by other processes to checkpoint their state. The present invention provides a method, system, and computer program that provides these and other benefits.
SUMMARY
The invention includes a data processing system and method that provides checkpointing and permits a continuous flow of data processing by allowing each process to return to operation after checkpointing, independently of the time required by other processes to checkpoint their state.
In particular, checkpointing in accordance with the invention makes use of a command message from a checkpoint processor that sequentially propagates through a process stage from data sources through processes to data sinks, triggering each process to checkpoint its state and then pass on a checkpointing message to connected “downstream” processes. This approach provides checkpointing and permits a continuous flow of data processing by allowing each process to return to normal operation after checkpointing, independently of the time required by other processes to checkpoint their state. This approach reduces “end-to-end latency” for each process stage (i.e., the total processing time for data from a data source to a data sink in a process stage), which in turn reduces end-to-end latency for the entire data processing system.
More particularly, the invention includes a method, system, and computer program for continuous flow checkpointing in a data processing system having at least one process stage comprising a data flow and at least two processes linked by the data flow, including propagating at least one command message through the process stage as part of the data flow, and checkpointing each process within the process stage in response to receipt by each process of at least one command message.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.


REFERENCES:
patent: 5495590 (1996-02-01), Comfort et al.
patent: 5630047 (1997-05-01), Wang
patent: 5692168 (1997-11-01), McMahan
patent: 5712971 (1998-01-01), Stanfill et al.
patent: 5802267 (1998-09-01), Shirakihara et al.
patent: 5923832 (1999-07-01), Shirakihara et al.
patent: 6401216 (2002-06-01), Meth et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Continuous flow checkpointing data processing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Continuous flow checkpointing data processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Continuous flow checkpointing data processing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3159433

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.