Fault-tolerant processing method

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

714 13, G06F 1116

Patent

active

060354151

DESCRIPTION:

BRIEF SUMMARY
FIELD OF THE INVENTION

The present invention relates to a fault-tolerant processing method for receiving and processing input messages to produce output messages. More particularly, the present invention relates to a method of operating a software fault tolerant recovery unit where the processing of input messages is done by replicate primary and secondary application processes.
It should be noted that the term "process" is used herein in a general sense of processing functionality provided by code executing on a processor however this code is organised (that is, whether the code is an instance of only part of a program, or of a whole program, or is spread across multiple programs). Furthermore, reference to a process as being an "application" process is intended to be understood broadly in the sense of a process providing some desired functionality for which purpose input messages are sent to the process.


BACKGROUND OF THE INVENTION

Software-based fault-tolerant systems may be considered as organised into one or more recovery units each of which constitutes a unit of failure and recovery. A recovery unit may be considered as made up of a live process, an arrangement for logging recovery information relevant to that live process, and recovery means, which in the event of failure of the live process, causes a replacement process to take over.
Of course, if failure of the live process due to failure of the processor running it is to be covered, then both the storage of recovery information and the recovery means itself must be separate from the processor running the live process.
Where a system comprises multiple recovery units, these will typically overlap in terms of processor utilisation; for example, the processor targeted to run the replacement process for a first recovery unit, may also be the processor running the live process of a second recovery unit. In fact, there may also be common resource utilisation by the recovery units in respect of their logging and recovery means.
An illustrative prior-art fault-tolerant computer system is shown in FIG. 1 of the accompanying drawings. This system comprises three processors I, II, III and a disc unit 10 all interconnected by a LAN 11. The system is organised as two recovery units A and B each of which has an associated live process A/L, B/L. Live process A/L runs on processor I and live process B/L runs on processor II. Recovery unit A is arranged such that upon failure of its live process A/L, a replacement process A/R will be take over on processor II; similarly, recovery unit B is arranged such that should live process B/L fail, a replacement process B/R takes over on processor III.
A live process will progress through a succession of internal states depending on its deterministic behaviour and on non-deterministic events such as external inputs (including messages received from other live processes, where present) and non-deterministic internal events.
When a replacement process takes over from a failed live process, the replacement process must be placed in a state that the failed process achieved (though not necessarily its most current pre-failure state). To do this, it is necessary to know state information on the live process at at least one point prior to failure; furthermore, if information is also known on the non-deterministic events experienced by the failed process, it is possible to run the replacement process forward from the state known about for the failed process, to some later state achieved by the latter process.
Where speed of recovery is not critical, an approach may be used where state information on the live process (process A/L in FIG. 1) is periodically checkpointed by the logging means of the recovery unit from the volatile memory of the processor running the process to stable store (disc unit 10). Upon failure of the live process A/L, the recovery means of the recovery unit can bring up a replacement process A/R in a state corresponding to the last-checkpointed state of the failed live process. Of course, unless check-point

REFERENCES:
patent: 4590554 (1986-05-01), Glazer et al.
patent: 4665520 (1987-05-01), Strom et al.
patent: 4937741 (1990-06-01), Harper et al.
patent: 5157663 (1992-10-01), Major et al.
patent: 5235700 (1993-08-01), Alaiwan et al.
patent: 5867501 (1999-02-01), Horst et al.
"Using Passive Replicates in Delta-4 To Provide Dependable Distributed Computing" pp. 184-190.
"Fault Tolerance Under UNIX" pp. 1-24.
"A Principle for Resilient Sharing of Distributed Resources" pp. 562-570.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Fault-tolerant processing method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Fault-tolerant processing method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fault-tolerant processing method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-373284

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.