Method and apparatus for monitoring computer system objects...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S038110, C714S048000, C714S025000

Reexamination Certificate

active

06594774

ABSTRACT:

TECHNICAL FIELD
This invention relates to reliable computer systems, and more particularly to monitoring computer system objects to improve system reliability.
BACKGROUND OF THE INVENTION
Computer technology is continually advancing, continually providing new and expanded uses for computers. As such uses continue to grow and expand, the importance of computers and people's reliance on their continued operation similarly grows. Currently, typical computer systems are “mostly reliable”. That is, most of the time computer systems operate as they are intended to. However, occasionally a computer system will “crash”—an application terminates abnormally, the entire computer system “freezes up” and will not respond to user input, etc. Such system crashes are typically resolved by the user either restarting the application that terminated abnormally, or alternatively by rebooting the entire system. While such system crashes can be annoying, the fact that the system is operating correctly most of the time is usually adequate for most computer systems, such as desktop computer systems.
However, in some settings or situations users expect a higher degree of system reliability, such that “mostly reliable” is insufficient. An example of such a system is a “vehicle computer”, which provides more conventional “desktop computer” functionality to vehicle operators and occupants. Vehicle operators typically expect the same level of reliability from vehicle computers as they do from the other electronic systems in their vehicles (e.g., audio systems), which is virtually 100% reliability. However, typical computer systems are not able to provide such higher levels of reliability.
An additional problem that computer systems can face is that of diagnostics. In some settings (e.g., in vehicles) it is very difficult to diagnose system problems at the time the problem occurs because there are no diagnostic or debugging connections to the system. Without the ability to diagnose problems with the system when the problems occur, it is more difficult (e.g., for designers and service technicians) to determine what caused the problems and how to avoid them in the future.
The invention described below addresses these disadvantages, providing an improved way to monitor computer system objects to improve system reliability.
SUMMARY OF THE INVENTION
The invention concerns a computer system executing multiple objects (e.g., processes, threads, DLLs, etc.). The invention provides a way to improve the overall reliability of the computer system by carrying out various monitoring functions and taking various actions when problems are detected.
According to one aspect of the invention, objects can register with a critical process monitor for various types of monitoring. As part of the registration process the object provides the type of monitoring it would like the monitor to perform in order to detect a failure of the object. The object also provides a recovery action that should be taken in the event the monitor detects a failure of the object. Additionally, a callback function can be provided that is used by the monitor to inform the object that recovery is about to occur and give the object a chance to decline the recovery action. One such type of monitoring is a “notification” type, in which the object continues to send notification messages to the monitor within a specified time interval. If the monitor does not receive a notification message within the specified time interval, then it determines that the object has failed. Another type of monitoring is a “watch” type, in which the monitor repeatedly checks whether the object is still executing. If the monitor detects that the object is no longer executing, then it determines that the object has failed.
According to another aspect of the invention, the monitor uses a “test” thread to help verify that an object has failed. If the monitor determines that the object has failed because it is not receiving notification messages within the specified time interval, the monitor checks how frequently a test thread of the monitor is being scheduled. If the test thread is not being scheduled, then the monitor assumes that the object has not failed, but rather that another process or thread is consuming a significant amount of processor time and is preventing other objects from being scheduled.
According to another aspect of the invention, a watchdog logic is included in the computer system. The watchdog logic is programmed to reboot the computer if it is not accessed regularly. The critical process monitor refreshes the watchdog logic regularly to avoid having the computer system rebooted. However, if a system problem prevents the critical process monitor from running, then the watchdog logic reboots the computer system.
According to another aspect of the invention, memory heap size for each process is monitored by the critical process monitor. If the heap of a process grows beyond a threshold size, then the monitor logs the event for subsequent diagnostic use.
According to another aspect of the invention, an Application Programming Interface (API) provides the interface between the monitor and the objects in the computer system, allowing the objects to access the various features of the monitor.


REFERENCES:
patent: 4072852 (1978-02-01), Hogan et al.
patent: 4318173 (1982-03-01), Freedman et al.
patent: 4512019 (1985-04-01), Bodig et al.
patent: 4587655 (1986-05-01), Hirao et al.
patent: 4785417 (1988-11-01), Obrea
patent: 5269017 (1993-12-01), Hayden et al.
patent: 5297150 (1994-03-01), Clark
patent: 5305455 (1994-04-01), Anschuetz et al.
patent: 5355469 (1994-10-01), Sparks et al.
patent: 5355483 (1994-10-01), Serlet
patent: 5392432 (1995-02-01), Engelstad et al.
patent: 5412802 (1995-05-01), Fujinami et al.
patent: 5526485 (1996-06-01), Brodsky
patent: 5628016 (1997-05-01), Kukol
patent: 5671351 (1997-09-01), Wild et al.
patent: 5689707 (1997-11-01), Donnelly
patent: 5715386 (1998-02-01), Fulton, III et al.
patent: 5748882 (1998-05-01), Huang
patent: 5758065 (1998-05-01), Reams et al.
patent: 5828830 (1998-10-01), Rangaraian et al.
patent: 5832283 (1998-11-01), Chou et al.
patent: 5832514 (1998-11-01), Norin et al.
patent: 5902352 (1999-05-01), Chou et al.
patent: 5944839 (1999-08-01), Isenberg
patent: 6016500 (2000-01-01), Waldo et al.
patent: 6065123 (2000-05-01), Chou et al.
patent: 6098166 (2000-08-01), Leibholz et al.
patent: 6131170 (2000-10-01), Oishi et al.
patent: 6134601 (2000-10-01), Spilo et al.
patent: 6173421 (2001-01-01), Weaver Johnson et al.
patent: 6178529 (2001-01-01), Short et al.
patent: 6279121 (2001-08-01), Gamo
patent: 6330709 (2001-12-01), Johnson et al.
patent: 6334193 (2001-12-01), Buzsaki
Naughton et al., Java 1.1: The Complete Reference, Second Edition, 1998, Osborne/McGraw-Hill, ch. 10: Exception-Handling Fundamentals.*
“Dynamic Memory Allocation for Multiple Concurrent Sorts”, IBM Technical Disclosure Bulletin, Dec. 1993, US, vol. 36, iss 12, pp. 369-370.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for monitoring computer system objects... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for monitoring computer system objects..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for monitoring computer system objects... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3024544

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.