Distributed, fault-tolerant and highly available computing...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S038110

Reexamination Certificate

active

07937618

ABSTRACT:
A method and system for achieving highly available, fault-tolerant execution of components in a distributed computing system, without requiring the writer of these components to explicitly write code (such as entity beans or database transactions) to make component state persistent. It is achieved by converting the intrinsically non-deterministic behavior of the distributed system to a deterministic behavior, thus enabling state recovery to be achieved by advantageously efficient checkpoint-replay techniques. The method comprises: adapting the execution environment for enabling message communication amongst and between the components; automatically associating a deterministic timestamp in conjunction with a message to be communicated from a sender component to a receiver component during program execution, the timestamp representative of estimated time of arrival of the message at a receiver component. At a component, tracking state of that component during program execution, and periodically checkpointing the state in a local storage device. Upon failure of a component, the component state is restored by recovering a recent stored checkpoint and re-executing the events occurring since the last checkpoint. The system is deterministic by repeating the execution of the receiving component by processing the messages in the same order as their associated timestamp.

REFERENCES:
patent: 4665520 (1987-05-01), Strom et al.
patent: 5794005 (1998-08-01), Steinman
patent: 5801938 (1998-09-01), Kalantery
patent: 5802267 (1998-09-01), Shirakihara et al.
patent: 5828878 (1998-10-01), Bennett
patent: 6026499 (2000-02-01), Shirakihara et al.
patent: 6078930 (2000-06-01), Lee et al.
patent: 6341262 (2002-01-01), Damani et al.
patent: 6397352 (2002-05-01), Chandrasekaran et al.
patent: 6553515 (2003-04-01), Gross et al.
patent: 6563829 (2003-05-01), Lyles et al.
patent: 6584581 (2003-06-01), Bay et al.
patent: 6708288 (2004-03-01), Ziegler et al.
patent: 7013348 (2006-03-01), Carson et al.
patent: 7376867 (2008-05-01), Aguilera et al.
patent: 2003/0005102 (2003-01-01), Russell
patent: 2003/0050954 (2003-03-01), Tayyar et al.
patent: 2004/0024578 (2004-02-01), Szymanski et al.
patent: 2005/0268146 (2005-12-01), Jin et al.
patent: 2005/0278578 (2005-12-01), Buskens et al.
patent: 2006/0045020 (2006-03-01), Picco et al.
patent: 2006/0156065 (2006-07-01), Glass
patent: 2006/0168338 (2006-07-01), Bruegl et al.
patent: 2007/0050582 (2007-03-01), Mangione-Smith
patent: 2007/0294537 (2007-12-01), Peyravian et al.
patent: 2008/0310460 (2008-12-01), Bargauan
Russell, D.L., “State restoration in systems of communicating processes”, IEEE Transactions on Software Engineering, Mar. 1980.
Jefferson, D., “Virtual time”, ACM Transactions of Programming Language and Systems, Jul. 1985.
Strom, et al., “Optimistic Recovery in Distributed Systems”, ACM Transactions on Computer Systems, Aug. 1985.
Bhola, et al., “Exactly-once delivery in a content-based published-subscribe system”, Proceedings of the International Conference on Dependable Systems and Networks (DSN'2002).
Bacon, et al., “Guava: a dialect of Java without data races”, Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, 2000.
Strom, R. E., “Fault-Tolerance in the SMILE Stateful Publish-Subscribe System” DEBS 2004—International Workshop on Distributed Event-Based Systems, May 2004, ACM, BCS, IEEE, pp. 1-6.
Cristian et al., “A timestamp-based checkpointing protocol for long-lived distributed computations”, Proceedings of the Symposium on Reliable Distribted Systems, vol. SYMP. 10, Sep. 30, 1991, pp. 12-20.
Smith et al., “Completely asynchronous 1-15 optimistic recovery with minimal rollbacks” Fault-Tolerant Computing, 1995. FTCS-25. Digest of Papers, Twenty-Fifth International Symposium, Jun. 27, 1995, pp. 361-370.
Reiher et al., “Virtual time based dynamic load management in the Time Warp Operating System” Distributed Simulation. Proceedings of the SCS Multiconference SCS, 1990, pp. 103-111.
Mattern, “Virtual Time and Global States of Distributed Systems” Parallel and Distributed Algorithms, Proceedings of the International Workshop on Parallel and Distributed Algorithms, Oct. 3, 1988, pp. 215-226.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Distributed, fault-tolerant and highly available computing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Distributed, fault-tolerant and highly available computing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distributed, fault-tolerant and highly available computing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2635367

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.