Hybrid multiple redundant computer system

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S011000, C714S012000

Reexamination Certificate

active

06550018

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to computer systems devoted for safety-critical and critical-control applications. More particularly, the present invention relates to hybrid multiple redundant systems that combine majority voting with fault diagnostic and fault recovering means to provide correct outputs of a system in the presence of multiple system component faults.
BACKGROUND OF THE INVENTION
The evolution of the computer has opened the door to widespread automation. Increasingly, computer systems handle critical tasks, such as, industrial control of oil, gas, nuclear, and chemical operations, patient monitoring, aircraft flight control, and military systems among others. Within these systems, emergency shutdown systems are used in safety-critical applications to monitor processes and remove the process to a safe state when selected process variables fall outside of a safe range. As one example, in an oil refinery, the pressure of an air compressor unit expander and its temperature are monitored and shutdown actions are taken if these reach an upset condition. In this example, the emergency shutdown system is designed to protect the process separately from the basic process control system. Critical control systems, on the other hand, provide both continuous control and protection for many safety-critical applications like gas and steam turbines, boilers, and off-shore platforms. In a gas or steam turbine, for example, the critical control system provides non-stop speed control as well as start-up and shutdown sequencing in a single integrated system. In all of the above examples, and other related industries, improved technologies add complexity and increase production output, making reliance on emergency shutdown systems and critical control systems increasingly important. Computer systems devoted for safe and critical control applications must have extremely high degrees of safety and reliability since faults in computer systems can cause vast economic loses and endangers human beings. A system failure that occurs as a result of a system component fault can be either safe or dangerous. A safe failure occurs when a system has failed into a safe state, or in other words, where the system does not disrupt the operation of other systems or compromise the safety of personnel associated with the system. The safe failure occurs, for example, when an emergency shutdown system (ESD) fails in such a way that it causes a shutdown not associated with the controlled process. A dangerous failure is a failure that prevents the system from responding to hazardous situations, allowing hazards to develop. For instance, a dangerous failure occurs when the ESD cannot perform a required shutdown.
Redundant configuration of computing systems have been used in several research and designs to provide system fault tolerance, which is concerned with the continuation of correct operation of a system despite occurrence of internal faults. The spectrum of fault tolerance techniques can be divided into three major classes: static redundancy, dynamic redundancy, and hybrid redundancy.
Static redundancy provides fault tolerance without performing fault detection and recovery. One method of fault masking is through a voting process. Triple Modular Redundant (TMR) systems are the most common form of voting based systems. The conventional TMR system includes three identical controllers along with an output voter network that votes the outputs of the three controllers. See, e.g., Frederickson A. A., “A Hybrid multiple redundant Programmable Controllers for Safety Systems”, @ ISA Transactions, Vol. 29, No. 2 (1990) pp. 13-17. Each controller usually includes an input module, a main processor module, and an output module. By using three identical controllers in combination with the voter network, any single computer fault is masked by the 2-of-3 voting, so any single fault does not lead to the system failure.
In many cases, however, two concurrent faults lead to a system failure. For example, if the TMR system is used as an Emergency Shutdown system, its outputs will be in ON condition under normal operation and in an OFF state for a shutdown. If, for example, two output modules of the TMR fail at the same time, in such a way that their outputs remain in an OFF condition, then the system fails safely, making a false shutdown. On the other hand, when two output modules fail in such a way that their outputs remain in an ON state, it can lead to a dangerous system failure. This failure is termed dangerous because, despite a process problem, the process cannot shut down.
To compensate for the TMRs inability to tolerate more than one controller failure, quick fault detection must be used to minimize the period of time that the system operates in a vulnerable condition. Commercial versions of TMR offer online module replacement and repair capability to address this problem. However, if one controller of the TMR fails and it has not been replaced, a fault in another controller can lead to a system safe or dangerous failure. Thus, the success of online repair depends on the user ability to discover and diagnose the problem in a short time period. Since fault discovery and repair rate are limited by many reasons, even a single controller failure may bring the system to a vulnerable mode.
As an alternative method of compensating for this vulnerability, known devices employ an output hot spare in an attempt to overcome the problem. That system has two triplicate I/O modules in parallel, where one module, a primary, is active, while the other module—a hot spare is powered but inactive. Each output module usually includes three identical legs located in a single board. Under normal operation, hot spare module outputs are OFF so they do not affect the system output. If a fault is detected on the primary module, the control is automatically switched to the hot spare module, allowing the system to maintain 2-of-3 voting continuously. The faulty module can then be removed and replaced without process interruption.
The hot spare method reduces the probability of a safe failure within a TMR system. For example, when a safe failure occurs in any leg of primary output module that is discovered and the hot spare outputs are passed to the ON state allowing the system to maintain energized condition of system outputs. However, employing a hot spare adds to the number of components in the system increasing the overall system cost. As a further disadvantage, the hot spare is useless when the outputs of faulty modules remain in an ON state, and, thus, cannot prevent the occurrence of a dangerous system failure.
To tolerate additional concurrent faults, known devices can add replicate computers within the voting scheme. For example, a five modular redundant system (5MR) would perform three-out-of-five voting in order to tolerate two faults. Unfortunately, the 5MR system requires large additional resources, which significantly increases size and weight of the system, making it very expensive to implement.
Turning away from static redundancy methods, Dynamic redundancy methods achieve fault tolerance by detecting the existence of faults and performing system reconfiguration to prevent a system failure. Dynamic redundancy systems have built-in fault detection capability. When a fault is detected, the system is usually reconfigured by activating a spare processor or computer. The most common example is the use of dual computer system that includes primary and spare computers operating in parallel. The system also includes a central diagnostic module that monitors primary computer and switch-over system output to the spare computer when the primary computer fails. The Dual Dynamic Redundancy (DDR) system has, therefore, three independent components: two computers and the Central Diagnostic Module (CDM) and it tolerates any single component failure. The DDR system, however, cannot operate properly in the presence of two concurrent faults. If two computers of the DDR fail at the same time, the entire system will

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Hybrid multiple redundant computer system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Hybrid multiple redundant computer system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hybrid multiple redundant computer system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3002342

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.