Manifold array processor

Electrical computers and digital processing systems: processing – Processing architecture – Array processor

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06338129

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to processing systems in general and, more specifically, to parallel processing architectures.
2. Description of the Related Art
Many computing tasks can be developed that operate in parallel on data. The efficiency of the parallel processor depends upon the parallel processor's architecture, the coded algorithms, and the placement of data in the parallel elements. For example, image processing, pattern recognition, and computer graphics are all applications which operate on data that is naturally arranged in two- or three-dimensional grids. The data may represent a wide variety of signals, such as audio, video, SONAR or RADAR signals, by way of example. Because operations such as discrete cosine transforms (DCT), inverse discrete cosine transforms (IDCT), convolutions, and the like which are commonly performed on such data may be performed upon different grid segments simultaneously, multiprocessor array systems have been developed which, by allowing more than one processor to work on the task at one time, may significantly accelerate such operations. Parallel processing is the subject of a large number patents including U.S. Pat. Nos. 5,065,339; 5,146,543; 5,146,420; 5,148,515; 5,546,336; 5,542,026; 5,612,908 and 5,577,262; European Published Application Nos. 0,726,529 and 0,726,532 which are hereby incorporated by reference.
One conventional approach to parallel processing architectures is the nearest neighbor mesh connected computer, which is discussed in R. Cypher and J. L. C. Sanz,
SIMD Architectures and Algorithms for Image Processing and Computer Vision
, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 37, No. 12, pp. 2158-2174, Dec. 1989; K. E. Batcher,
Design of a Massively Parallel Processor
, IEEE Transactions on Computers, Vol. C-29 No. 9, pp. 836-840 September 1980; and L. Uhr,
Multi
-
Computer Architectures for Artificial Intelligence
, New York, N.Y., John Wiley & Sons, Ch. 8, p. 97, 1987.
In the nearest neighbor torus connected computer of
FIG. 1A
multiple processing elements (PEs) are connected to their north, south, east and west neighbor PEs through torus connection paths MP and all PEs are operated in a synchronous single instruction multiple data (SIMD) fashion. Since a torus connected computer may be obtained by adding wraparound connections to a mesh-connected computer, a mesh-connected computer, one without wraparound connections, may be thought of as a subset of torus connected computers. As illustrated in
FIG. 1B
, each path MP may include T transmit wires and R receive wires, or as illustrated in
FIG. 1C
, each path MP may include B bidirectional wires. Although unidirectional and bidirectional communications are both contemplated by the invention, the total number of bus wires, excluding control signals, in a path will generally be referred to as k wires hereinafter, where k=B in a bidirectional bus design and k=T+R in a unidirectional bus design. It is assumed that a PE can transmit data to any of its neighboring PEs, but only one at a time. For example, each PE can transmit data to its east neighbor in one communication cycle. It is also assumed that a broadcast mechanism is present such that data and instructions can be dispatched from a controller simultaneously to all PEs in one broadcast dispatch period.
Although bit-serial inter-PE communications are typically employed to minimize wiring complexity, the wiring complexity of a torus-connected array nevertheless presents implementation problems. The conventional torus-connected array of
FIG. 1A
includes sixteen processing elements connected in a four by four array
10
of PEs. Each processing element PE
i,j
is labeled with its row and column number i and j, respectively. Each PE communicates to its nearest North (N), South (S), East (E) and West (W) neighbor with point to point connections. For example, the connection between PE
0,0
and PE
3,0
shown in
FIG. 1A
is a wraparound connection between PE
0,0
's N interface and PE
3,0
's south interface, representing one of the wraparound interfaces that forms the array into a torus configuration. In such a configuration, each row contains a set of N interconnections and, with N rows, there are N
2
horizontal connections. Similarly, with N columns having N vertical interconnections each, there are N
2
vertical interconnections. For the example of
FIG. 1A
, N=4. The total number of wires, such as the metallization lines in an integrated circuit implementation in an N×N torus-connected computer including wraparound connections, is therefore 2kN
2
, where k is the number of wires in each interconnection. The number k may be equal to one in a bit serial interconnection. For example with k=1 for the 4×4 array
10
as shown in
FIG. 1A
, 2kN
2
=32.
For a number of applications where N is relatively small, it is preferable that the entire PE array is incorporated in a single integrated circuit. The invention does not preclude implementations where each PE can be a separate microprocessor chip, for example. Since the total number of wires in a torus connected computer can be significant, the interconnections may consume a great deal of valuable integrated circuit “real estate”, or the area of the chip taken up. Additionally, the PE interconnection paths quite frequently cross over one another complicating the IC layout process and possibly introducing noise to the communications lines through crosstalk. Furthermore, the length of wraparound links, which connect PEs at the North and South and at the East and West extremes of the array, increase with increasing array size. This increased length increases each communication line's capacitance, thereby reducing the line's maximum bit rate and introducing additional noise to the line.
Another disadvantage of the torus array arises in the context of transpose operations. Since a processing element and its transpose are separated by one or more intervening processing elements in the communications path, latency is introduced in operations which employ transposes. For example, should the PE
2,1
require data from its transpose, PE
1,2
, the data must travel through the intervening PE
1,1
or PE
2,2
. Naturally, this introduces a delay into the operation, even if PE
1,1
and PE
2,2
are not otherwise occupied. However, in the general case where the PEs are implemented as micro-processor elements, there is a very good probability that PE
1,1
and PE
2,2
will be performing other operations and, in order to transfer data or commands from PE
1,2
to PE
2,1
, they will have to set aside these operations in an orderly fashion. Therefore, it may take several operations to even begin transferring the data or commands from PE
1,2
to PE
1,1
and the operations PE
1,1
was forced to set aside to transfer the transpose data will also be delayed. Such delays snowball with every intervening PE and significant latency is introduced for the most distant of the transpose pairs. For example the PE
3,1
/PE
1,3
transpose pair of
FIG. 1A
, has a minimum of three intervening PEs, requiring a latency of four communication steps and could additionally incur the latency of all the tasks which must be set aside in all those PEs in order to transfer data between PE
3,1
and PE
1,3
in the general case.
Recognizing such limitations of torus connected arrays, new approaches to arrays have been disclosed in U.S. Pat. No. 5,612,908;
A Massively Parallel Diagonal Fold Array Processor
, G. G. Pechanek et al., 1993 International Conference on Application Specific Array Processors, pp. 140-143, Oct. 25-27, 1993, Venice, Italy, and
Multiple Fold Clustered Processor Torus Array
, G. G. Pechanek, et. al., Proceedings Fifth NASA Symposium on VLSI Design, pp. 8.4.1-11, Nov. 4-5, 1993, University of New Mexico, Albuquerque, N. Mex. which are incorporated by reference herein in their entirety. The operative technique of these torus array organizations is the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Manifold array processor does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Manifold array processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Manifold array processor will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2873839

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.