Method and apparatus for providing fault-tolerance in...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S013000

Reexamination Certificate

active

07571347

ABSTRACT:
A system that provides fault tolerance in a parallel processing system. During operation, the system executes a parallel computing application in parallel across a subset of computing nodes within the parallel processing system. During this process, the system monitors telemetry signals within the parallel processing system. The system analyzes the monitored telemetry signals to determine if the probability that the parallel processing system will fail is increasing. If so, the system increases the frequency at which the parallel computing application is checkpointed, wherein a checkpoint includes the state of the parallel computing application at each computing node within the parallel processing system.

REFERENCES:
patent: 5664090 (1997-09-01), Seki et al.
patent: 5712971 (1998-01-01), Stanfill et al.
patent: 2005/0114739 (2005-05-01), Gupta et al.
patent: 2006/0168473 (2006-07-01), Sahoo et al.
patent: 2007/0168715 (2007-07-01), Herz et al.
Plank, James S., “ickp: A Consistent Checkpointer for Multicomputers”, 1994, IEEE.
Cao et al., “Design and Analysis of An Efficient Algorithm for Coordinated Checkpointing in Distributed Systems”, 1997, IEEE.
Li et al., “Low-Latency, Concurrent Checkpointing for Parallel Programs”, 1994, IEEE.
Kim et al., “An Efficient Protocol for Checkpointing Recovery in Distributed Systems”, 1993, IEEE.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for providing fault-tolerance in... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for providing fault-tolerance in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for providing fault-tolerance in... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4104271

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.