Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2001-02-20
2004-11-30
Beausoliel, Robert (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S010000, C714S011000, C714S012000, C700S082000
Reexamination Certificate
active
06826709
ABSTRACT:
TECHNICAL FIELD
This invention relates to a reconfiguration method applicable to a network of identical functional elements.
The field of the invention is in particular that of parallel computers composing a unit of matrix, ring, or hypercube interconnected processors, and that of 1D or 2D correlators, matrix architectures of blocks computing the fast Fourier transform (FFT). Two examples of the first kind of applications are thus given in the document referenced as [1] at the end of the description.
PRIOR ART
The increasing possibilities of micro-electronic technology, as well as the evolution of multiprocessor architectures, are leading to computers that are more and more complex both in terms of elements composing them (electronic gates, memories, registers, processors, . . . ) and in terms of complexity of the software used.
The designers of such computers having a high integration parallel or extensively parallel structure must take into account two conflicting requirements:
1 Machines having a parallel or extensively parallel structure are subject to faults due to the very great number of processors and their complexity, leading to poor manufacturing output and serious faults under normal operation.
2 With highly advanced technologies and high integration systems, more and more processors can be incorporated into an application specific integrated circuit (ASIC), a multichip module (MCM) or a card. In such systems, the main disadvantage is that of limited bandwidth, i.e. the amount of information that can be put through.
In order to meet the first of these requirements, one solution of known art consists in replacing faulty processors with spare processors which are identical to the others from an operational point of view. Such a solution, enabling “structural fault tolerance”, then tries to ensure proper operation, and in particular network consistency, so as not to penalize the architecture. It implies reconfiguration consisting in replacing faulty elements with spare elements available due to interconnection elements and intercommunication elements.
In a 2D (or bidimensional) type of network, the solutions proposed for providing fault tolerance are:
Adding as many processor lines to the system as faults are to be tolerated. This solution is very simple and requires few spare interconnections, reconfiguration being performed by simply bypassing the lines where there is a faulty processor. Performance loss is then limited. On the contrary, the spare processors are very poorly used as one line is required to tolerate one fault, and in case of a faulty bypass, the whole system is down.
Or adding switches, spare processors and connections to the standard network.
As described in the document referenced as [2], a network corresponding to the latter type of solution and called “m-Track, n-Spare” is composed of processors
10
, switches and spare connections. Two kinds of switches are used: switches
11
coupling processors with connections (PT=Processor to Track) and switches
12
coupling connections with each other (TT=Track-to-Track). All network links are bi-directional, i.e. communications can come and go in each connection. Spare processors
13
(sp) are positioned at the network borders. For the reconfiguration method to be effective, these processors must be positioned at least in one line and one column of the network.
FIG. 1
illustrates a sample network of the “2-Track, 1-Spare” type. Spare processors
13
(sp) are positioned all around the network and are used to reconfigure the network in case the useful processors
10
are faulty. Switches
11
,
12
are used to enable reconfiguration. Here, the network has 200% of spare connections in comparison with the so-called operational connections.
Those skilled in the art can then use a reconfiguration method, based on error correcting codes, which can be broken down into two phases:
the first one consists in finding, for each faulty processor, a compensation track bypassing the faulty processor and replacing it with a spare processor;
in case the first phase is successful, each processor, along the compensation track, is replaced with its nearest neighbour, thus reaching, through cascading changes, a spare processor. The operational grid is thus maintained.
The reconfiguration method considered above has two major disadvantages:
it is not suitable for unidirectional links; indeed, in this case, two connection buses, one round trip, are required for connecting the considered processor to each of its neighbours.
the number of switching elements passed between two logically neighbouring processors is not deterministic, which makes the method ineffective for dealing with the case of synchronous interprocessor communications.
In order to overcome these disadvantages, it is an object of the inventive method to solve the problem of fault tolerance in an extensively parallel architecture with significant coupling of functional elements, by proposing a solution meeting the following constraints:
obtaining a fault tolerant network with connections that may be unidirectional;
highly limiting inoperative communication media of the network;
limiting communication time between functional elements by limiting the number of reconfiguration switches passed between two functional elements;
allowing greater flexibility for choosing the number of spare functional elements;
having a solution capable of supporting different topologies, in particular matrix, ring or hypercube topologies.
SUMMARY OF THE INVENTION
This invention relates to a reconfiguration method of a network of parallel identical functional elements tolerant to the faults of these functional elements, the network comprising said basic functional elements, spare functional elements, interconnecting elements of these functional elements, and a control unit, said method comprising:
a step of positioning the functional elements of the logic network;
a routing step of programming interconnecting elements on the physical network, by choosing a maximum number of these interconnecting elements which can be passed between two neighbouring functional elements using a shortest track search algorithm.
In the method of the invention:
a sequence is determined for positioning the functional elements of the network that is composed of a starting functional element and a series of functional elements including all functional elements;
for each of the functional elements, it is tentatively positioned starting with its logical position, then, if required in case of failure, in each of the positions located at a distance
1
, distance
2
, . . . from the logical position of this functional element, a restriction being that one and only one spare position must be used with respect to the possible positions of the previously positioned functional elements, stopping when S+1 positions have been tested, S being the number of spare functional elements;
if S+1 positions have been tested without success, returning to the previous functional element in the positioning sequence and proceeding with the next position for this functional element;
possibly, when all functional elements have been positioned, it is checked for each network dimension that the logical sequence is followed for each pair of functional elements, if not, the positions of these functional elements are inverted.
In one embodiment, the positioning sequence is defined like this: the starting functional element is the top left functional element, the next functional elements are the functional elements to the right and below the starting functional element, and so on, following a diagonal.
It is also possible to divide the network into blocks and define a block positioning sequence starting with a starting block and going through all the blocks from one neighbouring block to the next, with the positions for the functional elements of one block not including any logical position of the functional elements of the previously positioned blocks.
Advantageously, this inventive method can be imple
Clermidy Fabien
Collette Thierry
Beausoliel Robert
Commissariat a l'Energie Atomique
Thelen Reid & Priest LLP
Wilson Yolanda
LandOfFree
Reconfiguration method applicable to an array of identical... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Reconfiguration method applicable to an array of identical..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reconfiguration method applicable to an array of identical... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3357835