Method and system for recovering from a software failure

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S023000

Reexamination Certificate

active

06314532

ABSTRACT:

COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
TECHNICAL FIELD
The invention disclosed herein relates generally to methods for recovering from software failures. More particularly, the invention relates to a method and system for replacing or correcting a program operating on a remotely located computerized device without the need to personally attend to the device.
BACKGROUND OF THE INVENTION
Due to bugs, operating environment problems, corruption of the stored software program, or other conditions, software programs frequently fail to perform their designated functions. When a program fails or crashes, it must frequently be replaced with an updated version or upgrade in which the bug is fixed, or the problem must otherwise be corrected in some fashion. This often requires a human operator to interact with the computer, either locally or remotely via a network or other communication link, to either correct the problem or replace the existing, faulty software program with the corrected upgrade.
However, if the portion of the program which is faulty affects the computer's basic input/output functions, the computer may be unable to accept commands from a human operator as needed to correct or replace the software. Further, if the faulty software affects the computer's ability to connect with a remotely located computer, the human operator at the remotely located computer may be unable to establish communications with the computer in order to correct or replace the faulty software. The operator would then be required to visit the computer to solve the problem.
This problem becomes pronounced in the context of communication system spread out over a large geographic area. For example, as shown in
FIG. 1
, a cellular wireless telecommunications system may contain a switching center
10
connected to local and long distance telephone offices, a number of base stations
12
connected to the switching center
10
and having transmitters and antennas and dispersed throughout a geographic area serviced by the system, and a number of wireless terminals
14
in wireless communication with the base stations
12
. An operational software program operates the basic functions of each of the base stations
12
.
If the operational software on any base station
12
is or becomes faulty, and the base station
12
fails to establish communications with the switching center
10
, the operator at the switching center
10
will be unable to take any action remotely in an effort to repair or replace the base station software. In that event, the operator must personally visit the base station
12
to repair or replace the defective software. If the operational software is so faulty that the operator can not even locally interface with the base station
12
to perform basic input/output operations, more drastic measures, such as equipment or device replacement, may be needed to replace the faulty operational software in the base station
12
.
The need to visit a failing base station
12
in the field places a substantial administrative burden on the system operator. This burden is exacerbated if a bug is prevalent in the copies of the operational software installed and running on all the base stations
12
in the system which prevents remotely upgrading the base station software to a new version. This event will require the operator to visit each and every station individually to repair or replace the software.
There is thus a need for a system to be able to recover from all failures of a software program from a distance, without the need to personally visit the failing computer.
SUMMARY OF THE INVENTION
It is an object of the invention to resolve the problems described above relating to repairing software failures in remotely located computerized devices.
It is another object of the invention to improve the reliability of operational software residing on a large number of computerized stations in a geographically widespread telecommunications system.
It is another object of the invention to recover from basic failures in operational software without the need to first install additional operating software.
The problems described above are overcome, in accordance with one embodiment of the invention, by storing and running on each remotely located computer on a network a software program which is known to be reliable and substantially free of bugs. The “known good” software tests the integrity of operational software residing on the remotely located computer, and decides whether to execute the operational software after each reset of the computer. The known good software monitors the number of resets of the operational software which occur without it ever having achieved a set or desired operating point, which may be predefined, such as establishing communications with a host computer. It may do this by storing a variable representing the number of resets and incrementing it each time a reset is performed, while that same variable is set to zero wherever the predefined operating point is achieved. If the variable reaches a limit or threshold, which may be predetermined, the known good software does not load the operational software at the next reset, but rather initiates a repair or replacement of the faulty operational software.
In a network or telecommunication system, the repair or replacement involves establishing communications with a host computer and transmitting a message of the error to the host. A human operator at the host computer can then attempt to remotely diagnose and repair the problem through the known good software. Alternatively, the host computer can automatically transmit an upgraded version of the operational software to replace the faulty version. The known good software receives the upgrade and overwrites the existing operational software.
In preferred embodiments, the known good software is a boot program which is loaded into memory every time the computer reboots. The known good software contains basic input/output functionality as well as routines for communicating with the host computer. Upon a reboot, the known good software tests the integrity of the operational software, such as by performing a cyclic redundancy check or checksum validation on the operational software file. If this validation test fails, the known good software establishes communications with the host computer to receive an upgrade. If the validation test passes, the operational software is executed and it is recorded whether or not a set operating point is reached.


REFERENCES:
patent: 3909795 (1975-09-01), Chang et al.
patent: 4237533 (1980-12-01), Mills et al.
patent: 5421006 (1995-05-01), Jablon et al.
patent: 5951699 (1999-09-01), Diez et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for recovering from a software failure does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for recovering from a software failure, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for recovering from a software failure will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2614108

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.