Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2007-05-22
2007-05-22
Beausoliel, Robert (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S048000, C714S004110
Reexamination Certificate
active
09954711
ABSTRACT:
A hierarchical, distributed Availability Management (AM) process for recovering from component failures in a data processing system. The hierarchy of AM elements track a failure modality hierarchy of the data processing system components. For example, the system hierarchy may include system cards, processors, and processes, in which case the associated AM elements may be implemented at a card manager (CM) level, a system manager (SM) level, and a process manager (PM) level. The AM hierarchy is designed to achieve a failure granularity so that failures in the lower levels of the hierarchy have less of an impact on the entire system. Each AM element is responsible for receiving failure notifications from processing system components associated with a next lower level of the hierarchy. Upon such indication, if the AM element determines that the failed component may be restarted, if the failed component may be restarted, the AM element then determines if it can be hot, warm, or cold restarted and it does so without further notification or implication to system availability of other components. Hot restart requires complete integrity of sate information, warm restart causes a recovery of last known good state information, and a cold restart results in the re-initialization of state information. If, the component cannot be restarted, then notification is provided to the next higher level of the hierarchy and the AM element itself terminates. One of the AM processes may execute as an identity management protocol. The identity protocol sets a temporary master state; waits a predetermined amount of time; and then sets a final master state only if no other system card has asserted a temporary master state. The waiting time period is selected to be greater than the longest expected initialization process for peer components in the system.
REFERENCES:
patent: 4965743 (1990-10-01), Malin et al.
patent: 5487131 (1996-01-01), Kassatly et al.
patent: 5740357 (1998-04-01), Gardiner et al.
patent: 5796990 (1998-08-01), Erle et al.
patent: 5828867 (1998-10-01), Pennell
patent: 5917731 (1999-06-01), Ferenczi et al.
patent: 6058387 (2000-05-01), Campbell et al.
patent: 6178445 (2001-01-01), Dawkins et al.
patent: 6249755 (2001-06-01), Yemini et al.
patent: 6675242 (2004-01-01), Benson et al.
patent: 6718481 (2004-04-01), Fair
patent: 6718486 (2004-04-01), Roselli et al.
patent: 6854069 (2005-02-01), Kampe et al.
patent: 6883170 (2005-04-01), Garcia
patent: 2003/0196141 (2003-10-01), Shaw
patent: 0416732 (1991-03-01), None
patent: 0953911 (1999-11-01), None
patent: WO9707638 (1997-02-01), None
Ciavaglia Stephen J.
Zaifman Arthur L.
Beausoliel Robert
Caseiro Chris A.
Duncan Marc
Enterasys Networks Inc.
Verrill & Dana, LLP
LandOfFree
System resource availability manager does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System resource availability manager, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System resource availability manager will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3756232