Electrical computers and digital processing systems: processing – Processing architecture – Array processor
Reexamination Certificate
2001-02-16
2004-01-20
Kim, Kenneth S. (Department: 2181)
Electrical computers and digital processing systems: processing
Processing architecture
Array processor
C712S015000, C714S010000
Reexamination Certificate
active
06681316
ABSTRACT:
TECHNICAL FIELD
This invention relates to a network of parallel processors tolerant to the faults thereof, and a reconfiguration method applicable to such network.
The field of the invention is that of parallel computers for all kinds of applications. Two sample applications are thus given in the document referenced as [1] at the end of the description.
PRIOR ART
The increasing possibilities of micro-electronic technology, as well as the evolution of multiprocessor architectures, are leading to computers that are more and more complex both in terms of elements composing them (electronic gates, memories, registers, processors, . . . ) and in terms of complexity of the software used.
The designers of such computers having a high integration parallel or extensively parallel structure must take into account two conflicting requirements:
1 Machines having a parallel or extensively parallel structure are subject to faults due to the very great number of processors and their complexity, leading to poor manufacturing output and serious faults under normal operation.
2 With highly advanced technologies and high integration systems, more and more processors can be incorporated into an application specific integrated circuit (ASIC), a multichip module (MCM) or a card. In such systems, the main disadvantage is that of limited bandwidth, i.e. the amount of information that can be put through.
In order to meet the first of these requirements, one solution of known art consists in replacing faulty processors with spare processors which are identical to the others from an operational point of view. Such a solution, enabling “structural fault tolerance”, then tries to ensure proper operation, and in particular network consistency, so as not to penalize the architecture. It implies reconfiguration consisting in replacing faulty elements with spare elements available due to interconnection elements and intercommunication elements.
In a 2D (or bidimensional) type of network, the solutions proposed for providing fault tolerance are:
adding as many processor lines to the system as faults are to be tolerated. This solution is very simple and requires few spare interconnections, reconfiguration being performed by simply bypassing the lines where there is a faulty processor. Performance loss is then limited. On the contrary, the spare processors are very poorly used as one line is required to tolerate one fault, and in case of a faulty bypass, the whole system is down.
or adding switches, spare processors and connections to the standard network.
As described in the document referenced as [2], a network corresponding to the latter type of solution and called “m-Track, n-Spare” is composed of processors
10
, switches and spare connections. Two kinds of switches are used: switches
11
coupling processors with connections (PT=Processor to Track) and switches
12
coupling connections with each other (TT=Track-to-Track). All network links are bi-directional, i.e. communications can come and go in each connection. Spare processors
13
(sp) are positioned at the network borders. For the reconfiguration method to be effective, these processors must be positioned at least in one line and one column of the network.
FIG. 1
illustrates a sample network of the “2-Track, 2-Spare” type. Spare processors
13
(sp) are positioned all around the network and are used to reconfigure the network in case the useful processors
10
are faulty. Switches
11
,
12
are used to enable reconfiguration. Here, the network has 200% of spare connections in comparison with the so-called operational connections.
Those skilled in the art can then use a reconfiguration method, based on error correcting codes, which can be broken down into two phases:
the first one consists in finding, for each faulty processor, a compensation track leading from the faulty processor to a spare processor;
In case the first phase is successful, each processor, along the compensation track, is replaced with its nearest neighbour, thus reaching, through cascading changes, a spare processor. The operational grid is thus maintained.
Such a network has many disadvantages:
Bi-directionality of links offers many possibilities for interprocessor routing, but has two major disadvantages in comparison with unidirectional links:
communication time is much longer, on the one hand, due to programming the link direction, and on the other hand, passing through the required circuits for providing such bi-directional communications.
complexity is increased, because interprocessor communications must be handled in order to determine the routing direction;
The number of added connections in comparison with “useful” links, which is a minimum of 100%, makes such a solution inadequate for high integration parallel computers where the bandwidth of certain levels, i.e. the number of connections, is very limited;
Having to add a substantial number of spare processors can lead to problems, in particular for small networks, comprising about a hundred processors, where spare processors can be blamed for 40% of possible faults.
The reconfiguration method considered above, in turn, has two major disadvantages:
it is not suitable for unidirectional links; indeed, in this case, two connection buses, one coming and one going, are required for connecting the considered processor to each of its neighbours.
the number of switching elements passed between two logically neighbouring processors is not deterministic, which makes the method ineffective for dealing with the case of synchronous interprocessor communications.
In order to overcome these disadvantages, it is an object of the network according to the invention to solve the problem of fault tolerance in an extensively parallel architecture with significant processor coupling, by proposing a solution meeting the following constraints:
obtaining a fault tolerant network with connections that may be unidirectional;
highly limiting inoperative communication media of the network;
limiting communication time between processors by limiting the number of reconfiguration switches passed between two processors;
allowing greater flexibility for choosing the number of spare processors;
having a solution capable of supporting different processor topologies, in particular matrix, line or hypercube topologies.
SUMMARY OF THE INVENTION
The invention relates to a network of parallel elementary processors tolerant to the faults of these processors comprising said elementary processors, spare elementary processors, elements interconnecting these processors, and a control unit, characterized in that it comprises alternately a series of interconnecting element lines and processor lines, each processor being surrounded by four interconnecting elements, with the processors lines being elementary processor lines, the last processor line being a line of spare elementary processors, the edge elements of the network being interconnecting elements, and in that the control unit, connected to the processors and interconnecting elements, sends instructions to the processors, controls the interconnecting elements, and checks the integrity of these processors. Each processor is connected to four interconnecting elements, two of these diametrically opposed elements being connected to the two processor inputs, the other two elements, also diametrically opposed, being connected to the two processor outputs, these interconnecting elements being connected together through vertical or horizontal links.
Advantageously, the interconnecting elements inside the network have a complexity of six inputs and six outputs, four inputs and four outputs being connected to the interconnecting elements inside the neighbouring network, and two inputs and two outputs being connected to the neighbouring processors of the interconnecting element inside the considered neighbouring network.
An interconnecting element has at least one unidirectional output and one unidirectional input connected to one input and one output of at least one South/West, North/
Clermidy Fabien
Collette Thierry
Kim Kenneth S.
Thelen Reid & Priest LLP
LandOfFree
Network of parallel processors to faults-tolerant towards... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Network of parallel processors to faults-tolerant towards..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Network of parallel processors to faults-tolerant towards... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3256371