Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2001-02-08
2004-12-14
Baderman, Scott (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S023000
Reexamination Certificate
active
06832329
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to data processing and, in particular, to error detection and correction. Still more particularly, the present invention provides a method, apparatus, and program for predicting array bit line or driver failures.
2. Description of Related Art
A system may include an event scan that is invoked periodically for each processor in the system. The system may also include error correction circuitry (ECC) to resolve correctable single-bit errors (CE). Some of these correctable errors may be detected in processor caches. A CPU Guard function can be used to dynamically de-allocate a processor and cache that has an error. A Repeat Guard function can be used to de-allocate the resource during boot process to ensure that the cache with the fault does not cause further errors until a customer engineer is able to fix the error. The system may include field replaceable units (FRU), each of which includes a processor and cache. The customer engineer may fix a fault by replacing the FRU.
A cache may have array bit line or driver failures that may cause correctable errors to be detected. Even though these errors are corrected and do not impact continued operation, it is desirable to detect when CEs are repeatedly caused by these types of cache faults. Prior art algorithms attempt to detect array bit line or driver failures by simply counting to some number of faults within a specified time period. Within those algorithms, however, intermittent single-bit errors caused by random noise or other cosmic conditions may result in many false reports of bit line or driver failures.
In addition, some prior algorithms rely on the system rebooting periodically to reset error threshold counters. However, in today's business-critical computing environments, computer systems are not rebooted very often. This may cause random errors to accumulate go and trigger false reports.
Thus, it would be advantageous to provide a cache thresholding method and apparatus for predictive reporting of array bit line or driver failures that does not generate false error reports because of random errors.
SUMMARY OF THE INVENTION
The present invention provides a mechanism for predicting cache array bit line or driver failures, which is faster and more efficient than counting all of the errors associated with a failure. This mechanism checks for five consecutive errors at different addresses within the same syndrome on invocation of periodic polling to characterize the failure. Once the failure is characterized, it is reported to the system for corrective maintenance including dynamic processor deconfiguration or preventive processor replacement.
REFERENCES:
patent: 5463768 (1995-10-01), Cuddihy et al.
patent: 5761411 (1998-06-01), Teague et al.
patent: 5892898 (1999-04-01), Fujii et al.
patent: 6345322 (2002-02-01), Humphrey
patent: 6438716 (2002-08-01), Snover
patent: 6493656 (2002-12-01), Houston et al.
patent: 6647517 (2003-11-01), Dickey et al.
Ahrens George Henry
Kitamorn Alongkorn
McLaughlin Charles Andrew
Vaden Michael Thomas
Baderman Scott
Glanzman Gerald H.
International Business Machines - Corporation
McBurney Mark E.
McCarthy Christopher S.
LandOfFree
Cache thresholding method, apparatus, and program for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cache thresholding method, apparatus, and program for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache thresholding method, apparatus, and program for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3318587