Hybrid triple redundant computer system

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S036000, C714S010000, C714S011000, C714S012000, C714S013000, C714S055000

Reexamination Certificate

active

06732300

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to computer systems devoted to safety-critical and critical-control applications. More particularly, the present invention relates to hybrid multiple redundant systems that combine majority voting with fault diagnostic and fault recovering means to provide correct outputs of a system in the presence of multiple system component faults.
BACKGROUND OF THE INVENTION
Real time data acquisition and control systems often operate in mission critical applications where the computations are critical to human safety, environmental cleanliness, or equipment protection. Examples include industrial controllers, high-speed trains, nuclear power plants, military systems, and hospitals. Computing systems devoted to such applications must provide fault tolerance since faulty computations in these systems can cause the loss of human life and/or expensive equipment. Redundant configuration of computing systems has been used in several research and design projects to provide system fault tolerance, which is the ability of a system to continue to perform its task after the occurrence of faults. A system failure that occurs as a result of a system component fault can be either safe or dangerous. A safe failure occurs when a system has failed into a safe state, or in other words, where the system does not disrupt the operation of other systems or compromise the safety of personnel associated with the system. The safe failure occurs, for example, when an emergency shutdown system (ESD) fails in such a way that it causes a shutdown not associated with the controlled process. A dangerous failure is a failure that prevents the system from responding to hazardous situations, allowing hazards to develop. For instance, a dangerous failure occurs when the ESD cannot perform a required shutdown.
Most deployed critical control systems are based on either triple modular redundant (TMR) or dual redundant (DR) architecture to achieve fault tolerance and increase safety and reliability. Each of these systems, however, typically tolerates the fault of only one system resource. If, for example, the TMR system is used as an Emergency Shutdown system, its outputs will be in an ON condition under normal operation and in an OFF state for a shutdown. If, for instance, two output modules of the TMR fail at the same time, in such a way that their outputs remain in an OFF condition, then the system fails safely, making a false shutdown. On the other hand, when two output modules fail in such a way that their outputs remain in an ON state, it can lead to a dangerous system failure. This failure is termed dangerous because, despite a process problem, the process cannot shut down.
To compensate for the TMRs inability to tolerate more than one controller failure, quick fault detection must be used to minimize the period of time that the system operates in a vulnerable condition. Commercial versions of TMR offer online module replacement and repair capability to address this problem. However, if one controller of the TMR fails and it has not been replaced, the next controller fault can lead to a system safe or dangerous failure. Thus, the success of online repair depends on the user's ability to discover and diagnose the problem in a short time period. Since fault discovery and repair rate are limited by many reasons, even a single controller failure may bring the system to a vulnerable mode.
As an alternative method of compensating for this vulnerability, known devices employ an output hot spare in an attempt to overcome the problem. That system has two triplicate I/O modules in parallel, where one module, a primary, is active, while the other module, a hot spare, is powered but inactive. Each output module usually includes three identical legs located in a single board. Under normal operation, hot spare module outputs are OFF so they do not affect the system output. If a fault is detected on the primary module, the control is automatically switched to the hot spare module, allowing the system to maintain 2-of-3 voting continuously. The faulty module can then be removed and replaced without process interruption.
The hot spare method reduces the probability of a safe failure within a TMR system. For example, when a safe failure occurs in any leg of primary output module that is discovered and the hot spare outputs are passed to the ON state allowing the system to maintain energized condition of system outputs. However, employing a hot spare adds to the number of components in the system increasing the overall system cost. As a further disadvantage, the hot spare is useless when the outputs of faulty modules remain in an ON state, and, thus, cannot prevent the occurrence of a dangerous system failure.
In many safe-critical and critical-control applications, where two faults and even more must be tolerated the TMR and DR systems cannot unfortunately be accepted. The Hybrid Multiple Redundant Computer (HMRC) system (FIG.
1
), disclosed in copending patent application Ser. No. 09/506,849 dated Feb. 19, 2000, which is incorporated by reference herein, remains operational in the presence of two concurrent faults until they are detected. The HMRC system
10
contains three parallel operating processing units
12
each of each comprises input module
14
, central processor module
16
, and output module
50
. The central processor module
16
is connected to the associated input module
14
and connected to primary and secondary output circuits
18
,
20
located in the associated output module
50
and in the neighboring output module
50
respectively. Each processing unit
12
further includes a watchdog controller
30
that monitors the associated central processor module
16
and transfers an alarm signal
44
to each output module
50
in the event of a central processor module
16
failure. Primary and secondary output circuits
18
,
20
in each output module
50
control an output voter network
22
and perform selectable but different logical functions among output data of the respective central processor and modules
16
and alarm signals
44
. If alarm signals
44
are not activated, the system generates an output
180
using a two-of-three vote among output data produced by three central processor modules
16
. In the event that one or two central processor modules
16
fail, the system is reconfigured to a two-of-two (2-of-2) and to a one-of-one (1-of-1) vote configuration respectively. Each central processor module
16
in turn monitors the status of the output modules and disables outputs of the output module
50
in the event that this module
50
fails. In general, the HMRC system remains operational in the face of as many as two component faults.
The HMRC system utilizes three alarm signals for each output module. It provides the system outputs reconfiguration from the 2-of-3 vote to the 2-of-2 and to the 1-of-1 vote in the presence of single or two faulty output modules respectively. If the HMRC system includes more than one set of the triplicated output modules, the system may use the same set of the three alarm signals for all of the triplicated output modules. In this case, however, a fault occurred in any one output module will lead to an undesirable reconfiguration of outputs in each set of the output modules even though these modules are still healthy. To overcome this problem, the system should be supplied by different alarm signals for each set of the triplicate output modules. The system should also have an associated means for activating only those alarm signals that are associated with the faulty output modules. However, the employ of the additional alarm signals requires the use of additional hardware and additional wires that increases the overall system cost. This disadvantage becomes especially considerable if the system includes a lot number of the remote output modules.
Another drawback of the HMRC system is that each CPM is connected to two output modules for transferring the same output data to each of them consequently. It decreases t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Hybrid triple redundant computer system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Hybrid triple redundant computer system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hybrid triple redundant computer system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3246440

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.