Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-09-27
2004-05-11
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S015000
Reexamination Certificate
active
06735716
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to computerized diagnostic and failure recovery techniques, and more specifically, to such techniques wherein a first computer process is monitored by a second computer process for occurrence of a failure condition, and the second process takes appropriate action when and if the failure condition occurs. As will be appreciated by those skilled in the art, although the present invention will be described in connection with specific embodiments and methods of use wherein the first process is a component object model (COM) server process, the present invention finds utility in diagnostics and failure recovery of processes other than COM processes. Thus, the present invention should not be viewed as being limited to use in diagnostics and failure recovery of COM processes, but rather should be viewed broadly as being limited only as set forth in the hereinafter appended claims.
2. Brief Description of Related Prior Art
The component object model (COM) is a software component architecture standard promulgated by Microsoft Corporation of Redmond, Wash. that allows applications and systems to be constructed using software components supplied by different software suppliers. The COM architecture permits higher level software components to exchange data with each other according to a well-defined protocol.
The specifics of the COM architecture are well known to those skilled in the art, and provide a programming language-independent and computer platform-independent standard for software component interoperability. More specifically, the COM architecture provides respective standards for software components executed on respective computer platforms that define how the components initialize and use virtual function tables to call functions supported by the software components via the function pointers in the tables. This standardizes the way in which components interoperate with other components when they call such functions.
In the COM architecture, an executing process that provides or uses a service may be called an “object.” Objects interact with each other via “interfaces.” In essence, a COM object interface provides and defines one or more related operations or functions (“methods”) provided by the object, and behaviors and responsibilities associated with these methods. An object accesses an interface of another object by utilizing a function pointer to that interface. A “server” object makes available one or more of its methods to a “client” object. That is, a client object accesses an interface of a server object to utilize one or more methods provided by that interface. The client object and the server object may each be separate computer processes, or alternatively, may be comprised in the same process.
According to the COM architecture, each and every interface has a respective, unique interface identifier associated with it. These interface identifiers are referred to as globally unique identifiers (GUID). When a client object wishes to discover whether or not a particular interface is supported by a server object, the client object calls a special method that is supported by all components, called QueryInterface. The result of calling this special method is either a return from the server object of the appropriate interface pointer and a success code, if the server object being queried supports that interface, or conversely, if the server object being queried does not support that interface, the server object returns an error value to the client object.
Further, in accordance with the COM architecture, when client and server objects are in different processes, proxy and stub intermediate objects are created which exchange data between the processes. More specifically, when the QueryInterface method returns a success code to the client object, and the client and server objects are in different processes, a proxy object is created in the client object and an associated stub object. is created in the server object.
There are a number of ways in which a server object may fail (i.e., experience a failure condition in its operation/execution, such as, becoming unresponsive to interface access requests from client objects). Examples of events that can cause a server object to fail include the experiencing by the server object of an untrapped exception or becoming deadlocked in its execution. When such a failure condition occurs in operation of the server object, if a client object and server object are comprised in different respective processes, a call made by the client object to a server object interface will be ineffective to call methods of that interface, and will instead result in return of an error message from the operating system to the client object. Typically, the client object may be programmed to take corrective action to return the server object to a normal operating mode (e.g., by issuing appropriate requests to the operating system that the operating system terminate and restart the server object), if the client object receives a predetermined number of such error messages in a predetermined time period.
Typically, after the server object has failed, a time period of several seconds may occur between the issuing of an interface call and the return of an error message to the client object from the operating system. Thus, since the client object typically will not take corrective action to return the server object to a normal operating mode unless the client object has received in the predetermined time period multiple error messages from the operating system, there may be a significant time lapse between failure of the server object and the taking of such corrective action by the client object. Disadvantageously, the significant time lapse that may exist between failure of a server object and the taking of corrective action by the client object introduces significant inefficiencies into the interactions between the client and server objects, and may reduce the processing efficiency of the computer system. If multiple client objects are involved, these inefficiencies may be further exacerbated.
SUMMARY OF THE INVENTION
In accordance with the present invention, computerized diagnostic and failure recovery techniques are provided that are able to overcome the aforesaid and other disadvantages and drawbacks of the prior art. More specifically, in one aspect of the present invention, a diagnostic and failure recovery technique is provided in which a first computer process (e.g., a COM server object process) requests that a second computer process monitor the first process for occurrence of a failure condition in operation of the first process. The second process initiates, if the second process determines that the failure condition has occurred, corrective action to return the first process to a normal operating mode.
In a second aspect of the present invention a technique is provided that may be practiced separately or in combination with the technique of the first aspect of the present invention. In the technique of the second aspect of the present invention, the first computer process requests that the second computer process monitor the first process for occurrence of a failure condition in operation of the first process. The second process provides to a third computer process (e.g., a COM client object process) an indication as to whether the failure condition has occurred. The second process may provide this indication to the third process in response to a request for such indication from a special proxy object in the third process. The proxy object may also detect and correct a pointer to the first process made invalid due to a failure and subsequent recovery of the first process.
The corrective action that may be taken by the second process in the technique of the first aspect of the present invention may comprise terminating and restarting the first process (e.g., via issuance of appropriate requests to an operating system process). Additionally, the second process m
Beausoliel Robert
Cesari and McKenna LLP
Cisco Technology Inc.
Duncan Marc
LandOfFree
Computerized diagnostics and failure recovery does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computerized diagnostics and failure recovery, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computerized diagnostics and failure recovery will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3219790