Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-12-11
2002-09-24
Gaffin, Jeffrey (Department: 2182)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S006130, C714S025000, C370S220000, C370S387000, C709S239000
Reexamination Certificate
active
06457140
ABSTRACT:
BACKGROUND
The present invention generally relates to a fault tolerant processing system and to a method of operating a fault tolerant processing system as well as a method for terminating a number of processed signals into a non-redundant signal.
In many processing systems, a redundant system architecture is utilized to meet the requirements on safety and reliability and to increase the mean time between system failure (MTBSF). Redundancy in a processing system is ensured by using multiple processing units that operate in parallel. With this arrangement, a faulty processing unit can easily be switched out of operation while the remaining and still-functioning processing units will maintain proper operation of the overall processing system. In the following, the processing units are normally referred to as processing planes.
A redundant system generally has a termination point at which the redundancy is terminated. In the termination point, plane termination logic determines which one of the processing planes that should be used, and the output signal of that plane is utilized as a non-redundant output signal of the processing system.
In the specific field of telecommunications, switches and switching systems are normally made redundant, using multiple switching planes, to maintain a desired quality of service for the users of the switching network. In known switching systems, the redundancy is terminated by using plane selection bits provided in the transmitted time slots.
FIG. 1
schematically illustrates an example of a conventional redundant switching system. The switching system
10
comprises a control system
1
, and a switching arrangement
2
. The switching arrangement
2
comprises a distribution unit
3
, a number of identical and parallel switching planes
4
,
5
,
6
, and plane termination logic
7
. In the illustrated example, there are three switching planes. The distribution unit
3
receives an input signal, and is designed to distribute the incoming input signal to each one of the switching planes
4
,
5
,
6
. The output signals of the switching planes
4
,
5
,
6
are sent to the termination logic
7
. In conventional switching systems, each transmitted time slot in each plane is provided with a plane selection bit such that each time slot includes a byte of information and a plane selection bit. The plane selection bits from the switching planes are utilized in a plane selection algorithm
8
incorporated in the termination logic
7
to determine, for each time slot, which one of the switching planes to use. When all switching planes function properly, it does not matter which plane is selected, and the selection algorithm
8
simply selects a predetermined one of the switching planes. However, if two of the planes are determined to be faulty by the overall control system
1
, then the control system
1
sets the corresponding plane selection bits to “invalid”, and the remaining still-functioning plane is selected by the selection algorithm
7
.
In conventional control systems, a software analysis of disturbances or faults in the switching planes is performed in order to determine the status (OK/faulty) of the planes. In a switching network, there are many examples of disturbances, such as parity errors, sporadic bit-errors and line code errors. Some of these disturbances are unavoidable, and there is generally no reason to intervene for a single disturbance. However, it is necessary to monitor the disturbance rate. If the rate of, for example, bit-errors in a switching plane rises to an unacceptable level, then the software has to react and set the plane selection bits of that plane to “invalid”, thus isolating the faulty plane.
With this prior art arrangement, the determination by the software that a plane is faulty takes place long after the actual occurrences of the disturbances. Consequently, the disturbances can not be corrected for.
In addition, the disturbances tend to propagate through the switching network and generate additional disturbances such that the control system software is flooded by different types of alarms.
SUMMARY OF THE INVENTION
The present invention overcomes these and other drawbacks of the prior art arrangements.
It is a general object of the present invention to provide a fault tolerant processing system that is improved with respect to isolation of faults occurring in the system.
It is another object of the present invention to provide a processing plane, for use with at least one like processing plane in a fault tolerant system, which in the event of a fault in the plane generates an output signal that facilitates recovery of valid processed data from the other processing planes.
It is yet another object of the invention to provide a method of operating a fault tolerant processing system.
Still another object of the invention is to provide a method for terminating at least two processed signals into a non-redundant signal.
The invention is especially applicable to a fault tolerant system having at least two processing planes, where each plane is operable for processing an input signal to generate an output signal, and plane termination logic for receiving the output signals of the processing planes to generate a non-redundant output signal.
In accordance with a first aspect of the invention, the processing planes operate continuously in parallel with each other, and, in one embodiment, the output signals of the processing planes are OR'ed together in the plane termination logic to generate the non-redundant output signal of the system. According to the same embodiment, each processing plane comprises means for detecting a fault or disturbance in the plane, and means for substituting, in response to detection of a fault in the plane, a signal component representing a logical zero for each one of those components of the processed input signal that are affected by or otherwise associated with the detected fault. Since signal components affected by a fault are “set” to zero, valid bits from the still-functioning plane or planes will be presented as output bits in the non-redundant output signal due to the OR-operation in the plane termination logic.
According to another embodiment, the “resetting” of affected signal components to logical zero and logically OR'ing the output signals of the planes are replaced by “setting” the affected signal components to logical one combined with logically AND'ing the output signals of the planes.
It will be appreciated that in a more general form of the invention, each one of the signal components that are affected by a detected fault is substituted by a signal component, referred to as a control component, of a predetermined logical state. In this context, it should be understood that the logically OR'ing and logically AND'ing are merely examples of the more general function of performing logical operations on the output signals of the planes such that, in the generation of the non-redundant output signal, unaffected signal components in a processed signal will override corresponding control components in another processed signal. Since unaffected signal components override affected signal components, the unaffected and valid signal components will be presented in the non-redundant output signal.
The processing performed by the processing planes is preferably switching, or switching in combination with some other processing, such as multiplexing and demultiplexing, associated with switching.
The invention runs counter to the predominant trend in the prior art in that it does not propose isolation of faulty processing planes, but instead proposes dynamic and local isolation of faults directly in the planes.
In addition, the redundancy termination according to the invention does not use plane selection bits, and hence the bandwidth demand is reduced.
In accordance with a second aspect of the invention, a processing plane for use with at least one like processing plane in a fault tolerant system is provided. The processing plane is operable for processing an input sig
Hansson Ulf Peter
Lindberg Lars Olof Mikael
Pettersson Lars Johan
Gaffin Jeffrey
Mai RiJue
LandOfFree
Methods and apparatus for dynamically isolating fault... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for dynamically isolating fault..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for dynamically isolating fault... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2852328