Method for monitoring fault of operating system and...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06697972

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a fault monitoring method for computer, and in particular to a fault monitoring method of an operating system (OS) and an application program (AP).
In general, an AP encounters a fault for several reasons and stops. Main causes are defects included in the AP itself and faults of an OS on which the AP is mounted. As for such an AP that its operation needs to be continued, the following method has been adopted. A fault is monitored. If a fault has been detected, then the operation of the AP is resumed from a state in which it was before the occurrence of the fault, and thereby recovery of the AP from the fault is attempted.
As one of fault monitoring methods for AP and OS, there is a method called watchdog. “Fault Tolerant Computer” written by Takashi NANYA, published by OHM-SHA says “The watchdog timer is a different process which is independent of a monitored process. The monitored process is so designed that a timer is reset at intervals of a fixed time (for example, in the range of several microseconds to several minutes) during the execution of the monitored process. If the timer is not reset until the time is up, some fault is considered to have occurred in the monitored process.”
In the case where a fault of an AP is monitored, an AP fault monitor having a watchdog receives a periodical alive message from the AP. If the alive message stops for a predetermined time, the AP fault monitor judges an AP fault to have occurred and restarts the AP. In the case where a fault of an OS is monitored, an OS fault monitor having a watchdog receives a periodical alive message from the OS. If the alive message disappears for a predetermined time, the OS fault monitor judges an OS fault to have occurred and restarts the OS.
Furthermore, for recovery of an AP from a fault, a technique called check-point is used. In the check-point technique, execution information of the AP is acquired from the monitored AP periodically and reserved. When a fault has occurred, the reserved execution information is taken out, and processing of the AP is resumed from a check-point.
SUMMARY OF THE INVENTION
As for an AP fault monitor using software, the AP fault monitor operates on the same OS as the monitored AP. In some cases, therefore, the AP fault monitor using software cannot cope with such an AP fault caused by the OS. Furthermore, there is known such an OS fault monitoring method that a watchdog is formed of dedicated hardware and the watchdog monitors a periodical alive message supplied from an OS. However, there is a problem that dedicated hardware must be prepared.
On the other hand, as for the check-point, if check-point information is preserved in a low speed memory, then it takes a considerable time to preserve the check-point information, and consequently the check-point repetition period becomes long. As a result, recovery from a fault must be conducted from a state of a considerably long time before the time of the fault occurrence. Furthermore, it is possible to provide a high speed non-volatile memory, such as a static RAM, apart from a volatile memory, such as a dynamic RAM, managed by the OS, and preserve check-point information in the high speed non-volatile memory. However, there is a problem that dedicated hardware must be prepared.
An object of the present invention is to provide a method for monitoring a fault of the OS by using software, without adding dedicated hardware.
Another object of the present invention is to provide a method for monitoring a fault of an AP and preserving check-point information of the AP at high speed without adding dedicated hardware.
The present invention solves the above described problems. In accordance with the present invention, in an operating system fault monitoring method for a computer, the computer including a first OS, a second OS different from the first OS, a multi-OS controller for managing computer resources, the multi-OS controller having inter-OS communication means between the first OS and the second OS, and a fault monitor operating on the second OS, the operating system fault monitoring method includes the steps of transmitting an alive message from the first OS to the fault monitor via the inter-OS communication means, and determining whether the alive message has been received by the fault monitor within a predetermined time.
In accordance with the present invention, in an application program fault monitoring method for a computer, the computer including an AP fault monitor operating on the first OS, and a high rank fault monitor operating on a second OS to monitor not only a fault of the first OS but also a fault of the AP fault monitor via inter-OS communication means, the application program fault monitoring method includes the step of monitoring a fault of the AP fault monitor operating on the first OS by using the high rank fault monitor. An AP to be monitored by the AP fault monitor preserves check-point information in a shared memory region on a main memory. In the shared memory region, information is preserved by a fault and restart of the first OS as well.
According to the present invention, fault monitoring of an OS to be monitored is conducted by utilizing a multi-OS environment and using a high rank fault monitor operating on another OS, as heretofore described. Without adding dedicated hardware, therefore, a fault of the OS can be monitored. Furthermore, a fault of an AP fault monitor can also be monitored by using the high rank fault monitor. Furthermore, since the high rank fault monitor using software is used, a recovery method to be used when a fault of each of the OS and AP fault monitor has occurred can be set finely. In addition, a monitored AP can preserve check-point information at high speed without adding dedicated hardware.


REFERENCES:
patent: 4809280 (1989-02-01), Shonaka
patent: 5875484 (1999-02-01), Neuhard et al.
patent: 6314501 (2001-11-01), Gulick et al.
patent: 6446225 (2002-09-01), Robsman et al.
patent: 2001/0016879 (2001-08-01), Sekiguchi et al.
patent: 2001/0025371 (2001-09-01), Sato et al.
patent: 2002/0116670 (2002-08-01), Oshima et al.
patent: 10-222388 (1998-08-01), None

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for monitoring fault of operating system and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for monitoring fault of operating system and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for monitoring fault of operating system and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3298956

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.