Fully pipelined parallel multiplier with a fast clock cycle

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C708S629000

Reexamination Certificate

active

06484193

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to arithmetic circuits in computer and digital signal processing systems, and more specifically, to multiplier circuits used for performing high-speed multiplications.
2. Description of the Related Art
Multipliers are one of the basic circuits of digital arithmetic. The speed at which a multiplier can deliver the product of two binary numbers becomes critical in certain applications where repetitive multiplications are required. Applications requiring repetitive multiplications include various digital signal processing functions, such as Finite Impulse Response (FIR) filters, and 3D rendering. Such applications require both high throughput and fast response time. The design of multipliers employed in these applications can have a significant effect on overall application performance.
Since multiplication is essentially repeated addition, it stands to reason that digital multipliers rely heavily on adder circuits. Commonly used adder circuits include the half-adder, the full-adder, and the carry-lookahead adder. The half adder takes two 1-bit inputs, and returns two outputs, a sum bit and a carry bit. A full adder returns the same outputs, but it has an extra input, known as a carry-in. The carry-in input is configured to receive a carry-out bit from an addition of lower-order bits. Because of the carry-in, full-adders can be cascaded to allow the addition of numbers larger than one bit. An adder formed by cascading several full adders is known as a ripple carry adder.
One problem with ripple carry adders is the fact that a carry generated at the lowest order bit position must be propagated through each subsequent bit position in a sequential manner. Such propagation adds a significant amount of time to the addition process. One solution to this problem is the carry-lookahead adder (CLA). In a CLA, the carry in bit is presented to each bit position in the adder, and is combined with the operand bits to either generate or propagate a carry. Therefore, the carry-in bit is not required to propagate through multiple stages sequentially as in a ripple carry adder. The CLA will require extra circuitry over a ripple carry adder. However, since the carry is not required to ripple through each stage sequentially, it can perform additions at a significantly greater speed.
Parallel array multipliers are a commonly used multiplier circuit in systems where increased performance is required. In one type of parallel array multiplier, the first step performed is the formation of a bit-product matrix. A bit-product matrix is simply an array of bit-products formed by multiplication of the individual bits of the two numbers being multiplied, a multiplicand and a multiplier. Formation of a bit-product matrix may become complicated in certain situations, such as multiplying signed numbers. In such cases, a specialized method for bit-product matrix formation may be required. Two common methods of bit-product matrix formation are the Baugh-Wooley method (as described in U.S. Pat. No. 3,866,030), and the Hatamian-Cash method.
FIG. 1
is an illustration of a bit-product matrix formed by an 8-bit multiplicand and a 4-bit multiplier using the Baugh-Wooley method.
After the formation of a bit-product matrix, many multipliers simply add the rows of the matrix to obtain the final product. However, the efficiency of this process suffers as the number of bits in the multiplier and multiplicand become larger. One solution to this problem is to use a reduction scheme. Luigi Dadda proposes several such schemes in his paper entitled
Some Schemes for Parallel Multipliers
(1965). Each of these schemes, referred to Dadda reduction schemes, employs combinational parallel counter circuits (not to be confused with sequential, or clocked, counter circuits). These parallel counters are used to reduce the number of rows until only two rows remain, a sum row and a carry row. The sum row and carry row are then added to form the final product of the multiplication. A multiplier employing a reduction scheme will typically be significantly faster than a one that simply adds the rows of the bit-product matrix.
The reduction of a bit-product matrix is accomplished in a number of steps. For example, in one reduction scheme, a bit-product matrix formed from two 8-bit numbers using the Baugh-Wooley method will produce a matrix having two rows. Reduction of this matrix will require five steps using a Dadda reduction scheme. The first step of the reduction will involve receiving the ten-row matrix and reducing it to nine rows. The second step will reduce nine rows to six rows. The third step of the reduction reduces the matrix from six rows to four rows, the fourth step from four rows to three rows, and the fifth step from three rows to two rows. In a typical reduction unit, the entire reduction is performed in one action using combinational logic.
The final two rows are then added to form the final product of the multiplication.
Typically, the addition of the final two rows is performed by cascading several adders together. For example, if the final product is to be 16 bits wide, the final two rows may be added by cascading four 4-bit CLA's. The CLA circuits will add the two rows, four bits at a time, from the lowest order bits to the highest order bits.
FIG. 2
is a block diagram of a parallel array multiplier employing a reduction scheme. The multiplier is configured to receive, at the bit-product matrix unit, two binary numbers: a multiplier and a multiplicand. These numbers can be any number of bits, but will typically be 8 bits, 16 bits, or other integral powers of two. The multiplier forms a bit-product matrix, which is then reduced to a two-row matrix in the reduction unit. The two rows of this matrix are then added in the addition unit. The final output of the addition unit is the product of the multiplication.
Although the bit-product matrix formation, reduction, and addition are shown as separate blocks in the figure, the internal logic of the multiplier in
FIG. 2
is combinational logic, so the entire multiplication is performed in one clock cycle.
Since the multiplier of
FIG. 2
performs the entire multiplication in one clock cycle, the clock cycle must be long enough to allow all operations to complete before beginning a new multiplication. This can have a limiting effect on the clock speed due to the large amount of combinational logic used. This problem is compounded for larger operands, as additional steps of reduction require additional levels of logic, resulting in a higher gate delay. The fact that the multiplier can perform only one multiplication at a time limits throughput even further. It would be desirable to create a multiplier circuit that would allow for increased throughput, and thus higher performance. One way to achieve higher throughput is with a faster clock cycle. Thus, it would also be desirable to create a multiplier circuit with a faster clock cycle.
SUMMARY OF THE INVENTION
The problems outlined above may in large part be solved by a fully pipelined parallel multiplier with a fast clock cycle, as described herein. In one embodiment, a pipelined parallel multiplier circuit utilizes each step of both the reduction process and the addition process as pipeline stages. Circuits within the multiplier include a d-type latch circuit, a half-adder circuit, a full adder circuit, and a 4-bit carry-lookahead adder (CLA) circuit. Each of these circuits is configured to generate and/or receive required logic signals and their corresponding complements. The use of these circuits enables the individual stages of a reduction scheme and an addition scheme to be used as pipeline stages. The d-type latch circuits are particularly important, as they are used to latch results from stage to stage within the multiplier, and thus dominate the hardware complexity of the multiplier. The overall scheme is generally applicable to any combination of a bit-product matrix formation and reduction scheme.
In one particular embodiment, the fully

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Fully pipelined parallel multiplier with a fast clock cycle does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Fully pipelined parallel multiplier with a fast clock cycle, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fully pipelined parallel multiplier with a fast clock cycle will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2989166

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.