Multiprocessor with pair-wise high reliability mode, and...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S010000, C714S015000, C714S797000, C712S020000

Reexamination Certificate

active

06772368

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to high reliability processing, by hardware redundancy. More particularly, the invention relates to a processing system with pair-wise processors that operate in a high reliability mode to detect computational errors, and operate independently in a high performance mode.
2. Description of the Related Art
Various approaches exist for achieving high reliability processing.
FIG. 1
illustrates one prior art processor
100
for high reliability processing. The processor
100
includes two execution units
130
and
135
, which are both the same type of arithmetic unit. For example, the two execution units could both be floating point units, or integer units. The processor
100
has architected registers
120
for holding committed execution results. The two execution units
130
and
135
both execute the same instruction stream in parallel. That is, for each instruction an instance of the instruction executes in each respective execution unit
130
and
135
. Then, when the two units are ready to commit the result for an instruction to the register file
120
, the two versions of the result are compared by compare unit
125
. If the compare unit
125
determines that the versions are the same, then the unit
125
updates one or more of the registers
120
with the result. If the versions do not match, then other actions are taken. In one implementation, a counter records whether an error is occurring repeatedly, and if it is, the error is classified as a “hard” failure. In the case of a hard failure, the instruction issue mechanism does not reissue the faulting instruction, but instead executes a “trap” instruction. One such trap leads to a micro code routine for reading out the state of the defective processor and loading it into a spare processor, which restarts execution at the instruction that originally faulted. In an alternative, where no spare processor is available, the trap leads to the operating system migrating the processes on the faulty processor to other processors, which adds to the workload of the other processors.
While this arrangement provides a reliability advantage, it is disadvantageous in that the processor design is more complex than a conventional processor and has greater overhead. Moreover, it limits the processor
100
throughput to have two execution units
130
in the processor
100
both executing the same instruction stream. Another variation of a processor which is designed for exclusively high reliability operation is shown in Richard N. Gufstason, John S. Liptay, and Charles F. Webb, “Data Processor with Enhanced Error Recovery,” U.S. Pat. No. 5,504,859, issued Apr. 2, 1996.
FIG. 2
illustrates another arrangement for high reliability processing. In this voting arrangement, three processors
200
each execute the same program in parallel and versions of a result are compared at checkpoints in the program on a bus
160
external to the processors
100
. If the versions do not match, then other actions are taken, such as substituting a different processor
100
for the one that produced the disparate version. This arrangement is advantageous in that complexity of the individual processors
200
is reduced, and an error producing processor can be identified. Also, the throughput of one of the processors
200
may be greater than that of the one processor
100
in
FIG. 1
, since the individual processor
200
does not devote any of its execution units to redundant processing. However, the arrangement of
FIG. 2
is redundant at the level of the processors
200
, and uses three whole processors
200
to recover from a single fault. Also, the error checking is limited to results which are asserted externally by the processors.
From the foregoing, it may be seen that a need exists for improvements in high reliability processing.
SUMMARY OF THE INVENTION
The foregoing need is addressed in the present invention. According to the invention, in a first embodiment, a multiprocessing system includes a first processor and a second processor. Each of the processors have their own data and instruction caches to support independent operation. In a first mode, a “high performance” mode, the processors independently execute separate instruction streams. In a second mode, a “high reliability” mode, both processors execute the same instruction stream. That is, for an instruction in the stream each processor computes its own version of a result.
The system includes a compare unit for indicating whether the respective versions match. If the versions do not match for an instruction, the instruction is deemed to be a faulting instruction. Responsive to the system being in the high reliability mode and the compare unit indicating a faulting instruction, the processors recover a state that the processors had prior to execution of the faulting instruction, and the processors re-execute the faulting instruction.
In an embodiment, each of the processors has a respective signature generator. Each of the signature generators is coupled to the compare unit. Responsive to the respective versions, the signature generators assert signatures to the compare unit, so that a faulting instruction may be detected.
In another aspect, each processor has its own respective commit logic. If the compare unit receives matching signatures for corresponding versions of a result, the compare unit signals the commit logic in each respective processor that the possibility has been eliminated of a calculation interrupt arising for that instruction. This permits the commit logic to commit the result. If the signatures do not match, the compare unit signals the commit logic that the corresponding instruction has faulted. In response, the commit logic permits instructions prior to the faulting instruction in program order to continue execution, but flushes instructions, and their results, that follow the faulting instruction in program sequence. Alternatively, the commit logic flushes those results that were produced by the faulting instruction, and only selected instructions results subsequent in program order to the faulting instruction, that is, those instructions and their results dependent on the faulting instruction.
In still another aspect, in one embodiment such a signature includes a bit indicating parity for the signature's corresponding version of the result. For one such embodiment, the signature consists of a single parity bit. In an alternative, the signature includes a number of parity bits for respective subsets of its version. In another embodiment, the signature includes a sum for all the bits of its version of the result. In another embodiment, the signature includes the entire version itself.
In another aspect, the system includes complete logic for generating an error correction code for including as part of the processor state with an instruction result. For such a instruction result, the signature generators produce their respective signatures in response to their respective result versions, including the error correction codes for the versions.
In a still future aspect, in the high performance mode, in which the processors execute separate programs or instruction streams, each processor will have independent bus accesses through its own respective bus logic. For this circumstance, mode control logic notifies arbitration logic in the bus interface unit to arbitrate between the independent bus requests of the two bus logic units.
In the high reliability mode, in which the two processors both execute the same program or instruction stream in parallel, each processor will need identical, lockstep bus accesses. For this circumstance, mode control logic notifies arbitration logic in the bus interface unit to allow only one of the bus logic units to control bus requests and read the bus for both processors in the system.
In a further aspect, since the processors are subject to external interrupts, which can disturb synchrony unless coordinated properly, the bus interface unit for the system has

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multiprocessor with pair-wise high reliability mode, and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multiprocessor with pair-wise high reliability mode, and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiprocessor with pair-wise high reliability mode, and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3358315

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.