Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1997-12-29
2002-08-20
Wright, Norman M. (Department: 2131)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S022000, C714S024000, C714S038110, C714S047300
Reexamination Certificate
active
06438709
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention pertains to the field of computer systems. More particularly, this invention pertains to the field of recovering from computer system malfunctions.
2. Background of the Related Art
For many years, computer system manufacturers, computer component manufacturers, and computer users have been concerned with detecting and recovering from computer system malfunctions. There are many reasons why a computer system might malfunction, including memory data corruption, data corruption related to fixed disks or removable media, operating system errors, component errors, components overheating, applications or operating systems performing illegal instructions with respect to the processor, incompatibility between various hardware and software system components, etc.
Some of these types of malfunctions have been effectively dealt with by prior systems. For example, memory data corruption can be handled by parity detection and/or error correcting code (ECC). Illegal instructions can be trapped by the processor and in many cases handled either within the processor or by the operating system. Other malfunctions may result in system “hangs.” A system is “hanged” when it is no longer able to respond to user inputs and/or is not able to respond to system events including, but not limited to, incoming network traffic, etc. Some malfunctions that can result in system hangs include operating systems or hardware components entering unknown or indeterminate states, causing the operating system or hardware component to cease normal operation. In these cases, the computer user must restart the computer. Restarting the computer after a system hang can cause problems such as data loss and corruption.
Some prior computer systems have included timers known as “watchdog” timers. A typical watchdog timer implementation involves a processor periodically resetting a timer, and under normal operation the timer never reaches a certain value. If the timer ever reaches the certain value, the computer system is reset. This solution causes no action to take place to attempt to cure the malfunction other than to take the drastic action of resetting the computer system. Resetting the computer system may result in the same problems mentioned above with regard to a user restarting a computer, including data loss and corruption.
Separate error checking processors have been included in computer systems in order to detect and attempt to recover from system hangs. This solution has the disadvantage of being costly. The computer user benefits from less costly computer systems. Therefore, a lower cost method and apparatus for detecting and recovering from computer system malfunctions is desirable.
SUMMARY OF THE INVENTION
A method for recovering from a computer system lockup condition is disclosed. In one embodiment of the method, as interrupt is generated to the computer system's operating system notifying the operating system of the lockup condition. An operating system interrupt handler is then executed. The interrupt handler performs at least one step to attempt to cure the lockup condition. If the interrupt handler fails to cure the lockup condition, the interrupt is regenerated to the operating system notifying the operating system of the lockup condition. The interrupt handler is then re-executed in response to the regeneration of the interrupt, with the interrupt handler performing a further step in attempting to cure the lockup condition.
REFERENCES:
patent: 4654821 (1987-03-01), Lapp
patent: 5864656 (1999-01-01), Park
patent: 5951686 (1999-09-01), MacLaughlin
patent: 5956475 (1999-09-01), Burckhartt et al.
patent: 6061810 (2000-05-01), Potter
patent: 6230286 (2001-05-01), Shapiro et al.
patent: 6253320 (2001-06-01), Sekiguchi et al.
patent: 6314532 (2001-11-01), Daudelin et al.
Wells Calvin E.
Wright Norman M.
LandOfFree
Method for recovering from computer system lockup condition does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for recovering from computer system lockup condition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for recovering from computer system lockup condition will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2972115