Remote checkpoint memory system and protocol for fault-tolerant

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

39518204, 711135, G06F 1100

Patent

active

057375143

ABSTRACT:
A mechanism for maintaining a consistent, periodically updated state in main memory without constraining normal computer operation is provided, thereby enabling a computer system to recover from faults without loss of data or processing continuity. In this invention, a first computer includes a processor and input/output elements connected to a main memory subsystem including a primary element. A second computer has a remote checkpoint memory element, which may include one or more buffer memories and a shadow memory, which is connected to the main memory subsystem of the first computer. During normal processing, an image of data written to the primary memory element is captured by the remote checkpoint memory element. When a new checkpoint is desired (thereby establishing a consistent state in main memory to which all executing applications can safely return following a fault), the data previously captured is used to establish a new checkpointed state in the second computer. In case of failure of the first computer, the second computer can be restarted to operate from the last checkpoint established for the first computer. This structure and protocol can guarantee a consistent state in main memory, thus enabling fault-tolerant operation.

REFERENCES:
patent: 3588829 (1971-06-01), Boland
patent: 3736566 (1973-05-01), Anderson et al.
patent: 3761881 (1973-09-01), Anderson et al.
patent: 3803560 (1974-04-01), DeVoy et al.
patent: 3889237 (1975-06-01), Alferness et al.
patent: 3979726 (1976-09-01), Lange et al.
patent: 4020466 (1977-04-01), Cordi et al.
patent: 4044337 (1977-08-01), Hicks et al.
patent: 4164017 (1979-08-01), Randell et al.
patent: 4228496 (1980-10-01), Katzman et al.
patent: 4373179 (1983-02-01), Katsumata
patent: 4393500 (1983-07-01), Imazeki et al.
patent: 4403284 (1983-09-01), Sacarisen et al.
patent: 4413327 (1983-11-01), Sabo et al.
patent: 4426682 (1984-01-01), Riffe et al.
patent: 4459658 (1984-07-01), Gabbe et al.
patent: 4484273 (1984-11-01), Stiffler et al.
patent: 4566106 (1986-01-01), Check, Jr.
patent: 4654819 (1987-03-01), Stiffler et al.
patent: 4734855 (1988-03-01), Banatre et al.
patent: 4740969 (1988-04-01), Fremont
patent: 4751639 (1988-06-01), Corcoran et al.
patent: 4817091 (1989-03-01), Katzman et al.
patent: 4819154 (1989-04-01), Stiffler et al.
patent: 4819232 (1989-04-01), Krings
patent: 4905196 (1990-02-01), Kirmann
patent: 4924466 (1990-05-01), Gregor et al.
patent: 4941087 (1990-07-01), Kap
patent: 4958273 (1990-09-01), Anderson et al.
patent: 4964126 (1990-10-01), Musicus et al.
patent: 4965719 (1990-10-01), Shoens et al.
patent: 5157663 (1992-10-01), Major et al.
patent: 5214652 (1993-05-01), Sutton
patent: 5235700 (1993-08-01), Alaiwan et al.
patent: 5239637 (1993-08-01), Davis
patent: 5247618 (1993-09-01), Davis
patent: 5269017 (1993-12-01), Hayden et al.
patent: 5271013 (1993-12-01), Gleeson
patent: 5276848 (1994-01-01), Gallagher et al.
patent: 5313647 (1994-05-01), Kaufman et al.
patent: 5325517 (1994-06-01), Baker et al.
patent: 5325519 (1994-06-01), Long et al.
patent: 5327532 (1994-07-01), Ainsworth et al.
patent: 5408649 (1995-04-01), Besheaus et al.
patent: 5488716 (1996-01-01), Schneider et al.
patent: 5488719 (1996-01-01), Schultz
patent: 5504861 (1996-04-01), Crockett et al.
N.Bowen and D.Pradhan, "Processor- and Memory-Based Checkpoint and Rollback Recovery," 1993 IEEE Transactions on Computers, pp. 22-30.
Y.Lee and K.Shin, "Rollback Propagation Detection and Performance Evaluation of FTMR.sup.2 M --A Fault Tolerant Multiprocessor," 1982 IEEE Transactions on Computers, pp. 171-180.
C.Kubiak et al., "Penelope: A Recovery Mechanism for Transient Hardware Failures and Software Errors," 1982 IEEE Transactions on Computers, pp. 127-133.
A.Feridun and K.Shin, "A Fault-Tolerant Multiprocessor System with Rollback Recovery Capabilities," 1981 IEEE Transactions on Computers, pp. 283-298.
P.Lee et al., "A Recovery Cache for the PDP-11," i IEEE Transactions on Computers, vol. C-29, No. 6, Jun. 1980, pp. 546-549.
M. Banatre, A. Gefflaut, C. Morin, "Scalable Shared Memory Multi-Processors: Some Ideas to Make Them Reliable", in Hardware and Software Architectures for Fault Tolerance, Springer-Verlag, 1994 Lecture Notes in Computer Science, presented after Jun. 10, 1993.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Remote checkpoint memory system and protocol for fault-tolerant does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Remote checkpoint memory system and protocol for fault-tolerant , we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Remote checkpoint memory system and protocol for fault-tolerant will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-22891

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.