Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
2000-06-01
2003-05-20
Malzahn, David H. (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
C708S627000, C708S630000
Reexamination Certificate
active
06567834
ABSTRACT:
The present invention relates to implementation of multipliers in programmable arrays, more particularly in a reconfigurable processor device.
A commercially successful form of reconfigurable device is the field-programmable gate array (FPGA). These devices consist of a collection of configurable processing elements embedded in a configurable interconnect network. Configuration memory is provided to describe the interconnect configuration—often SRAM is used. These devices have a very fine-grained structure: typically each processing element of an FPGA is a configurable gate. Rather than being concentrated in a central ALU, processing is thus distributed across the device and the silicon area of the device is used more effectively. An example of a commercially available FPGA series is the Xilinx 4000 series.
Such reconfigurable devices can in principle be used for any computing application for which a processor or an ASIC is used. However, a particularly suitable use for such devices is as a coprocessor to handle tasks which are computationally intensive, but which are not so common as to merit a purpose built ASIC. A reconfigurable coprocessor could thus be programmed at different times with different configurations, each adapted for execution of a different computationally intensive task, providing greater efficiency than for a general purpose processor alone without a huge increase in overall cost. In recent FPGA devices, scope is provided for dynamic reconfiguration, wherein partial or total reconfiguration can be provided during the execution of code so that time-multiplexing can be used to provide configurations optimised for different subtasks at different stages of execution of a piece of code.
FPGA devices are not especially suitable for certain kinds of computational tasks. As the individual computational elements are very small, the datapaths are extremely narrow and many of them are required, so a large number of operations are required in the configuration process. Although these structures are relatively efficient for tasks which operate on small data elements and are regular from cycle to cycle, they are less satisfactory for irregular tasks with large data elements. Such tasks are also often not well handled by a general purpose processor, yet may be of considerable importance (such as in, for example, image processing).
Alternative reconfigurable architectures have been proposed. One example is the PADDI architecture developed by the University of California at Berkeley, described in D. Chen and J. Rabaey, “A Reconfigurable Multiprocessor IC for Rapid Prototyping of Real Time Data Paths”, ISSCC, February 1992 and A. Yeung and J. Rabaey, “A Data-Driven Architecture for Rapid Prototyping of High Throughput DSP Algorithms”, IEEE VLSI Signal Processing Workshop, October 1992. Another alternative architecture is MATRIX, developed at the Massachussetts Institute of Technology and described in Ethan Mirsky and André deHon, “MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources”, FCCM '96—IEEE Symposium on FPGAs for Custom Computing Machines, Apr. 17-19, 1996, Napa, Calif., USA, and in more detail in André deHon, “Reconfigurable Architectures for General-Purpose Computing”, pages 257 to 296, Technical Report 1586, MIT Artificial Intelligence Laboratory. The MATRIX structure has advantageous aspects, but the coarse grain size means that it consumes more silicon than a conventional FPGA structure and is likely to be less efficient for tasks which are regular from cycle to cycle. It would therefore be desirable to develop further reconfigurable structures which combine as best possible the advantages of both MATRIX and of conventional FPGAs.
A further development of the present applicants, described in European Patent Application No. 97310220.5, filed on Dec. 17, 1997 (with an overall architecture termed “CHESS”, and described in International Patent Application No. GB 98/00248, filed on Jan. 28, 1997) describes a reconfigurable device comprising: a plurality of processing devices; a connection matrix providing an interconnect between the processing devices; and means to define the configuration of the connection matrix; wherein each of the processing devices comprises an arithmetic logic unit adapted to perform a function on input operands and produce an output, wherein said input operands are provided as inputs to the arithmetic logic unit from the interconnect on the same route in each cycle, and wherein means are provided to route the output of a first one of the processing devices to a second one of the processing devices to determine the function performed by the second one of the processing devices.
In a preferred form of CHESS, each of the processing devices has a first operand input, a second operand input, a function result output, a carry input and a carry output, wherein the first operand input, the second operand input and the function result output are n-bit, where n is an integer greater than 1, and the carry input and the carry output are 1-bit. A particularly good design solution is found when n is equal to 4. The mechanism used for dynamic instruction is that each of the processing devices is adapted to receive, for determination of its function, an n-bit instruction input from another of the processing devices.
It is also advantageous that each of the processing devices contains a latchable output register for the function output. This is useful for constructing a “deep” pipeline, where for example it is necessary to perform a number of operations in parallel and synchronise the provision of output from different ALUs.
A particularly important issue for all of the architectures described above is the implementation of a multiplier. Multipliers are a key element for many calculations, and many of the applications most suitable for use in an ASIC or coprocessor will contain large number of multiplication operations. A conventional approach to implementation of a multiplier is now described.
A combinational multiplier is commonly built as a repetitive array of core cells, in which each cell multiplies some bits (say M bits) from the multiplicand A with some bits (say N bits) from the multiplier B, to produce an (M+N) bit partial product. To allow a complete multiplier to be built, each core cell also needs to be able to add two additional inputs to the partial product, i.e. to compute the function ((A*B)+C+D). The D input is used to sum all of the partial products of the same significance, and the C input is used to add carries from less significant partial products. The (M+N) bit result from each core cell is divided into two portions:
1. The least significant M bits are fed to the D input of the adjacent core cell that produces a result of the same arithmetic significance.
2. The most significant N bits are fed to the C input of the adjacent core cell that produces a result M bits more significant.
The core cell of a 1 bit*1 bit multiplier can be implemented in one of three ways:
1. As two of 4-input lookup tables (LUTs), each with A, B, C, D as inputs, and each producing one of the two bits of output.
2. As a two input AND gate to compute (A*B), feeding a full adder that adds the result to C and D. This requires one 2-input LUT and two 3-input LUTs.
3. As a full adder to compute (A+C+D), feeding a multiplexer that selects either this result or D to be fed to the output, under the control of B.
Either of these solutions costs more resources than would be needed simply to perform the full addition. Multipliers are hence costly (in terms of silicon area, and hence in terms of actual cost) in FPGA structures. Any approach in devices of this general type which can increase the density of multipliers in a processing array will be highly advantageous in reducing cost.
Accordingly, the invention provides a method of multiplying a first number by a second number by use of an array of processing devices, each of said processing devices having a plurality of data inpu
Marshall Alan David
Stansfield Anthony
Vuillemin Jean
Elixent Limited
Malzahn David H.
Orrick Herrington & Sutcliffe LLP
LandOfFree
Implementation of multipliers in programmable arrays does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Implementation of multipliers in programmable arrays, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Implementation of multipliers in programmable arrays will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3055107