Electrical computers and digital processing systems: processing – Processing control – Logic operation instruction processing
Reexamination Certificate
1998-07-09
2001-11-06
Kim, Kenneth S. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Logic operation instruction processing
C712S227000, C714S004110
Reexamination Certificate
active
06314512
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of The Invention
The invention relates to the field of multi-process systems, e.g., computer networks, and in particular, to system failure detection and notification in an asynchronous processing environment.
2. Background Information
Multi-processing systems are known, such as computer networks, and application programs may utilize resources which are distributed among the systems, e.g., a database located on a remote computer on the network may be accessed by an application program started by through an end user interface on a personal computer (PC). Such an application program will be referred to herein generically as a network application.
System failures may result from any number of causes, for example, a computer or process abending (abnormal ending of a task, e.g., crashing), losing communications, or because of a reboot.
In network applications, it is important to detect any system failure in a timely fashion in order to provide feedback to a user at the end user interface. In particular, if a user at an end user interface has commanded an operation that is destined to fail because of such a system failure, it is important to update the end user interface with that information as soon as possible so as not to waste the time of the user.
It is further important to detect the failure in a timely fashion in order to take corrective action within the network application. If corrective action is possible, it should be taken without a long delay so as not to delay the processing of the application.
It is also important to detect the failure in a timely fashion in order to clean up resources on other systems in the network that are dependent upon the failed system. Failed operations continue to consume resources until the failure is detected and the resources are released. In a network application, these resources often exist on other systems than the failed system which are involved in the processing of the operation that has failed.
In a synchronous processing environment, system failure is typically detected when an operation is initiated on that system and the system fails to respond. Detection of the failure is thus delayed until such an operation is attempted.
However, in an asynchronous processing environment detection is not as simple as in the synchronous environment. An operation could result in system A and system B sending multiple messages to one another in an asynchronous fashion. At any point in time, it may be just as correct for one of the systems to send a message to the other as it is for one of the systems to never have to send a message to the other. The lack of messages flowing between the systems is therefore not necessarily a valid indicator of failure. The messages may be sporadic, or they may never have to occur. So in the asynchronous case, a long-running operation may continue to appear normal, even though a system has already failed.
While system failure could be detected when the next operation involving the failed system is initiated, that operation might not be initiated until minutes, hours or even days after the system has failure has occurred.
A need therefore exists for system failure detection in the asynchronous processing environment which is virtually immediate, thus solving the problems related to not having notification of a system failure in a timely fashion.
SUMMARY OF THE INVENTION
It is, therefore, a principal object of this invention to provide a method and system for automatic detection and notification of system failure in a multi-system, e.g., network, application.
It is another object of the invention to provide a method and apparatus that solves the above mentioned problems so that system failure is detected immediately upon occurrence and notification given in a timely fashion.
These and other objects of the present invention are accomplished by the method and apparatus disclosed herein.
Advantageously, the present invention solves the problem of detecting in a timely fashion that a system involved in a network application has failed. According to an aspect of the invention, detection of the system failure is virtually immediate.
According to another aspect of the invention, the first time that a message needs to be sent between a network server and another system which performs distributed operations, a connection object is created by a respective connection manager on both the server and the other system to represent the communication connection.
According to another aspect of the invention, the respective connection manager controls and tracks the respective system's active connections.
According to another aspect of the invention, any subsequent messages between the respective systems will use the same connection object.
According to another aspect of the invention, a separate thread owned by the respective connection manager monitors the status of the connection.
According to another aspect of the invention, when there is a communication failure, the connection manager detects it immediately.
According to another aspect of the invention, the connection manager then sends a message which causes an update message to be sent to all service objects that exist notifying them of the system failure. Operations in the network application utilize service objects and their corresponding proxies that exist on the systems involved in the operation. A “service object” is a bundle of data and function for performing a particular service, and a proxy is a stand-in on one system for a corresponding object on another system.
According to another aspect of the invention, the service objects may then notify the end-user that the system has partially or completely failed.
According to another aspect of the invention, clean-up operations may be started, or other corrective action may be taken.
According to another aspect of the invention, all messages from the time of the failure through the handling of the failure are sent asynchronously so that other application operations are not severely impacted.
According to another aspect of the invention, failure handling is advantageously initiated when the failure occurs, i.e., there is no waiting for a next message to be sent across the connection to determine that a failure has occurred.
REFERENCES:
patent: 6016500 (2000-01-01), Waldo et al.
patent: 6018805 (2000-01-01), Ma et al.
patent: 6021437 (2000-02-01), Chen et al.
patent: 6021507 (2000-02-01), Chen
Branson, et al., U.S. Patent Application Ser. No. 09/112,353, filed Jul. 9, 1998.
Branson Michael J.
Halverson Steven G.
Rackham Devaughn L.
Streit Andrew J.
Townsend Susette M.
Bussan Matthew J.
International Business Machines - Corporation
Kim Kenneth S.
Lynt Christopher H.
LandOfFree
Automatic notification of connection or system failure in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic notification of connection or system failure in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic notification of connection or system failure in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2603993