Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-06-30
2001-12-04
Iqbal, Nadeem (Department: 2785)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C712S009000
Reexamination Certificate
active
06327668
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to providing determinism in a multiprocessor computer system, to a monitor and processor for such a system and to a method of operating such systems. A particular application of the invention is to fault tolerant processing systems.
Many processing systems operate to a strict timing regime, changing their internal state on a known clock. Such a synchronous design of a processing system results in a large finite state machine. The internal state and outputs of this machine are entirely predictable, if inputs are presented in a known relationship to the clock. This determinism enables the construction of a fault tolerant multi-computer system by providing checking hardware, which compares the operation of one processor or set of processors against that of another identical processor or set of processors. The checking hardware can be arranged to check for faults in the operation of one or more of the processing sets by comparing the outputs of those processing sets on each clock.
Other processing systems do not behave in such a simple manner. Examples of this type are processing systems where the clock is not known, where multiple unrelated clocks are used, or where processor operation uses no clocks at all. These processing systems cannot be modelled as synchronous finite state machines. It may not be possible to present inputs to these processing systems in any known relationship to the computer's internal state. The detailed operation of these machines is non-deterministic. This prevents ordinary construction of checking hardware to compare operation between identical systems.
An aim of the present invention is to enable the provision of a deterministic multiprocessor system where at least one processor, or set of processors, operates asynchronously of another processor or set of processors.
SUMMARY OF THE INVENTION
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Combinations of features from the dependent claims may be combined with features of the independent claims as appropriate and not merely as explicitly set out in the claims.
In accordance with one aspect of the invention, there is provided a monitor for a multiprocessor system. The monitor includes a plurality of processing sets, where at least one processing set is operable asynchronously of another processing set. The monitor is connectable to receive I/O operations output from the processing sets. The monitor is operable to synchronise operation of the processing sets by signalling the processing sets on receipt of progress indications indicative of a plurality of the processing sets being at an equivalent stage of processing.
In an embodiment of the invention, therefore, in addition to providing for the monitoring of I/O operations, a monitor is provided for responding to outputs for the processing sets indicative of the processing sets being at an equivalent stage of processing to synchronise the operation of the processing sets. In this maimer, a plurality of asynchronous processors can be kept in step in a deterministic manner, at least at selected points during processing. This facilitates the cross checking of I/O operations for fault tolerant operation and also facilitates the timely delivery of interrupts.
The monitor can be operable, when an equivalent progress indication has been received from each of at least a plurality of processing sets, to return an acknowledgement signal to the processing sets from which a progress indication has been received. In certain cases, the acknowledgement signal may only be returned to the processing sets when a progress indication has been received from all processing sets.
The monitor is preferably operable to pass an interrupt from an I/O device to the processing sets with an acknowledgement signal for an equivalent progress indication. In this manner, the interrupts can be passed to the processing sets in a deterministic manner at an equivalent stage of processing.
The monitor can determine faulty operation of the processing sets on detecting non-equivalent operation thereof.
The monitor may be operable with only two processing sets, or with three or more processing sets. Where the monitor is used with three or more processing sets, a faulty processing set can be determined by majority voting. Where the monitor is used with only two processing sets, or where further processing sets have failed leaving only two processing sets, a faulty processing set may be determined by initiating processing set diagnostics on the processing sets.
In a preferred embodiment of the invention, the monitor is connectable to receive I/O operations output from the processing sets, and is operable to buffer the I/O operations, to compare an I/O operation output from a processing set to I/O operations buffered for another processing set for determining equivalent functioning of the processing sets, and to issue a state modifying I/O operation only on determining equivalent operating (or equivalent operation or functioning) of the processing sets.
In accordance with another aspect of the invention, there is provided a multiprocessor computer system. The system includes a plurality of processing sets, wherein at least one processing set is operable asynchronously of another processing set. The system also includes a monitor as described above.
In a preferred embodiment of the invention, the synchronising and fault monitoring operations are performed by a common I/O monitor unit.
Each of the processing sets can be configured, for example by the provision of appropriate control code and/or appropriate hardware, to record its progress in processing instructions and to issue a progress indication to the monitor as an I/O operation each time a predetermined progress increment has been recorded. Issuing the progress indication as an I/O operation, facilitates the use of a monitor unit for both synchronisation and fault monitoring purposes. However, the progress indication could instead be output as, for example, a signal on a dedicated or shared signal line.
Each processing set can include an instruction counter, with a progress indication for each progress increment of n counts. In a preferred embodiment the counter is implemented as a decrementer with a progress indication being issued when the decrementer underflows.
In order that the period between progress indications is relatively constant, it is advantageous to associate each instruction with a count value, whereby the counter is modified by the count value for an instruction on retiring of the instruction. The count value can be dependent on one or more of an instruction type, an operand and an address.
The recording of the progress of instruction processing can be suspended in a processing set for execution of certain instructions, such as an instruction executed by a software emulation in a processing set.
In order to allow for differences in processing speed in respective processing sets, while still maintaining processing sets substantially in step, a processing set is stalled on recording a progress increment when an acknowledgement signal for a previous progress increment has not been received by the processing set. The stalled processing set is kept stalled until the acknowledgement signal for the previous progress increment has been received by the processing set.
The monitor can be connected to receive and buffer I/O operations output from the processing sets, to compare an I/O operation output from one processing set to I/O operations buffered for another processing set for determining equivalent functioning of the processing sets, and to issue a state modifying I/O operation only on determining equivalent operating of the processing sets. A non-repeatable state modifying operation could be a read instruction with side effects or a write instruction. An embodiment of the invention can thereby respond to I/O instructions in an efficient manner, directly forwarding I/O operations which are not state modifying (i
Conley Rose & Tayon PC
Iqbal Nadeem
Kivlin B. Noäl
Sun Microsystems Inc.
LandOfFree
Determinism in a multiprocessor computer system and monitor... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Determinism in a multiprocessor computer system and monitor..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Determinism in a multiprocessor computer system and monitor... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2585001