Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-10-28
2002-02-26
Le, Dieu-Minh (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S011000
Reexamination Certificate
active
06351829
ABSTRACT:
TECHNICAL FIELD OF THE INVENTION
The present invention is directed, in general, to computing and processing systems and, more specifically, to systems and methods for distinguishing a device failure from a failure to communicate with the device.
BACKGROUND OF THE INVENTION
Automated plant control systems include a comprehensive set of algorithms, or software-definable process control routines, to control and monitor various processes within, for instance, a manufacturing facility. The control systems can be tailored to satisfy a wide range of process requirements globally or within specified portions of the facility. Conventionally, the control systems include a variety of modules, each having its own processor and firmware, linked together by communication buses to result in a distributed process control system. The distributed nature of the system affords high performance with the capability to expand the system incrementally to satisfy growth or modifications in the facility.
In a real-time process control system, processing can be distributed in such a manner where there exists two controllers coupled together paralleling the same operation. Because the same operation or process is paralleled, these controllers are referred to as “dual redundant process controllers.” Dual redundant process controllers operate in such a manner that one of the controllers (designated the “primary controller”) is always in a lead state (meaning that it has actual control of all or part of the system). The other process controller (the “secondary controller”) mirrors the primary controller's processes but is not in actual control of the system. In effect, the secondary controller parallels the lead controller in all aspects of operation and data storage and remains ready to take over from the primary controller should the primary controller fail. If such a failure occurs in the primary controller, the operation of actual control (“lead state”) of that part or all of the system should be assumed by the secondary controller. When the secondary controller asserts the lead state, the primary controller can no longer operate in the lead state and the secondary controller then becomes the primary controller for that part or all of the real-time process system.
Normally, each of the dual redundant process controllers contains a processor and firmware and is linked to the overall system. The processor could be, for example one of the i960Hx series of superscalar RISC processors commercially available from the Intel Corporation. The processor usually resides on a local bus which also includes local random access memory (“RAM”), memory for program storage, and hardware for monitoring and controlling external functions. Firmware is a computer program contained persistently in a read-only memory (“ROM”) associated with the processor. The primary activity of the local bus is control and management of the controller through firmware execution by the central processing unit (“CPU”). Additionally, the primary and secondary controllers are normally interconnected with each other in some manner of circuitry like coaxial or fiber optic cable. This inter-connectivity between dual redundant process controllers allows the controllers to communicate operational states, and keep mirror-image activity of the lead state controller communicated to the secondary controller along with any information in the form of data that should be stored on the secondary controller.
The fundamental and critical requirement of real-time process systems using dual redundant process controllers is the singularity of operation for the lead-state controller over at least that part of the system it is to control. One and only one of the dual redundant controllers can be in actual control (have the lead state) of all or part of the system at any time. If lead-state singularity is not preserved, the processing system could encounter dual commands from the primary and secondary controllers that would be competing and/or conflicting, which could lead to a system lock up, overload, shut down, or other devastating process-system type failure. In large manufacturing facilities or plants, a failure of a process controller could be very costly in many ways including down-time for equipment and manpower, probable loss or destruction of raw materials, and the subsequent expense of restarting the process. In fact, the avoidance of such a devastating system failure is so important that it becomes the basis for the conceptualization of redundant controllers. And the absorption of the additional costs of having redundant controllers are now a necessary consideration rather than an exception.
Because the criticality exists for lead-state singularity for dual redundant process controllers, the dependency on the reliability of inter-connectivity of communications between the dual redundant process controllers is paramount. The primary and secondary controllers must be able to intelligently transition the lead-state control from the primary controller to the secondary controller timely and effectively in the event of a failure of the primary controller, allowing the process system to continue without any interruption or at least as minimal an interruption as possible.
A problem that arises from the critical nature of the singularity of operation of the lead-state controller, is the ability of the secondary controller to correctly determine when to assert the lead state. As previously discussed, it is paramount for process-system integrity that the secondary controller correctly determine when to assert lead-state control. Failure scenarios can be of more than one type and may or may not create the necessity for the secondary controller to assert lead-state control.
If the failure is an inter-controller communications failure, as in a connector cable break, the primary controller remains viable and should remain in the lead state. The secondary controller should be intelligent enough to know that no requirement nor attempt to assert control responsibility is necessary because the primary controller has not had a failure occur. On the other hand, if a device failure occurs in the primary controller, the necessary requirement exists for the secondary controller to know that the partner device failure has occurred and to immediately activate and assert the lead state. And for both failure scenarios, there is always the basic requirement to ensure the two controllers are not colliding while attempting to control the system. Without the ability for the secondary controller to distinguish between a device failure from a failure to communicate with the partner device, lead-state control could be asserted by the secondary controller and possibly compromise the lead-state singularity of the dual controllers, jeopardizing process system integrity.
Ideally, if the secondary controller could know that a device failure has occurred, a transition from the primary controller to the secondary controller could be determinatively effected, thereby preserving system integrity. Thus, it is advantageous that the secondary controller have the ability to assess the difference between a device failure in the primary controller and that of a communications link failure from the inter-connectivity of the two controllers.
Previous attempts have been made to accomplish the task of inter-connectivity reliability and distinguishing failure scenarios between controllers by adding hardware to establish alternate communication paths. By allowing alternate communication paths, it was thought the solution had been achieved. But other problems came to light with the additional hardware including increased cost of additional devices, added complexity and a possible degradation of reliability that the additional hardware created with new and possible undetectable failure scenarios for the controllers. In effect, the solution that was being provided actually introduced more problems than it solved and could defeat its intended purpose.
Another problem that has been encountered in effectively
Dupont Anthony J.
Payne Paul A.
Hitt Gaines & Boisbrun
Honeywell INC
Le Dieu-Minh
LandOfFree
System and method for distinguishing a device failure from... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for distinguishing a device failure from..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for distinguishing a device failure from... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2953191