Apparatus and method for increasing performance of...

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Apparatus and method for increasing performance of... Apparatus and method for increasing performance of...

: 2000-02-15
: 2004-05-25
: Chaki, Kakali (Department: 2124)
: Electrical computers: arithmetic processing and calculating
: Electrical digital calculating computer
: Particular function performed

: C708S620000, C708S708000
: Reexamination Certificate
: active
: 06742011
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention generally relates to a multiplication apparatus and method for increasing the efficiency of multipliers, and more particularly, to an apparatus and method for increasing the performance of a low radix multiplier such that high radix performance can be achieved without a significant increase in wiring circuitry.
2. Description Of Related Art
Currently, the speed of many arithmetic operations in present processor implementations is increased by utilizing a floating-point processor. A floating-point processor usually includes carry save adders to increase the performance of multiplication operations.
Generally, there are two popular stages of radix multiplication for microprocessors. High radix multiplication (radix 8 or greater) has the advantage of requiring fewer partial products to be generated and summed, as compared with low radix multiplication (radix 4 or lesser). However, high radix multiplication requires that complex multiples of the X operand to be generated. An example of this is illustrated in
FIG. 1B
with regard to the 3X and -3X operands required for radix 8 multiplication applications.
FIG. 1A
depicts the radix 4 multiplication table, the 3 multiplier bits, and X operand multiples. As can be seen for radix 4 multiplication, only the simple multiples of zero, 1X, and 2X are required for the operand. As it is known in the art, a multiple of a number can be easily generated for the zero, one, and two multiples. A zero multiple requires only that the value be reset, zeroed out, or cleared out. A −1X multiple requires that the complement of the operand be obtained. A 2X multiple of a number is easily generated for the number by performing a left shift by one position on the number. A −2X multiple of a number is obtained by acquiring the complement of the 2X multiple times.
FIG. 1B
depicts the radix 8 multiplication table, the 4 multiplier bits, and X operand multiples. As can be seen by referring to
FIG. 1B
, radix 8 multiplication requires the multiples of the zero, 1X, 2X, 3X, and 4X. As noted above, the zero, 1X, and 2X multiples are fairly straightforward and easy to compute. However, the 3X and −3X multiples required for radix 8 multiplication are quite complex and require special circuitry, such as carry look ahead adders, to compute. The 3X and −3X operand multiples are computed by using a carry propagation adder that adds the 1X and 2X multiples to generate the 3X multiples and by acquiring the complement of the 3X multiple. The 4X and −4X multiples are fairly straightforward and easy to compute. The 4X and −4X multiples are computed by performing a left shift by two positions for the binary number and by acquiring the complement of the 4X multiple.
As stated above, a major problem with radix 8 multiplication is the generation of the 3X and −3X operand multiples.
While the simplicity of radix 4 multiplication is often preferred to radix 8 multiplication there are some advantages of radix 8 multiplication. First, radix 8 multiplication generates fewer partial products that must be dealt with.
In this regard, radix 4 multiplication often requires many more carry save adders as compared to radix 8. For example, for a 64-bit array, radix 4 multiplication usually requires 33 rows of carry save adders to compute the product. For a 64-bit array, radix 8 multiplication requires only 22 rows of carry save adders to compute the product. This is computed utilizing the formula “(number of bits manipulated+number of bits of the multiplier)
umber of bits of the multiplier”. For radix 4 multiplication the formula equals [(64+2)/2]=33 rows, and for the radix 8 application the formula equals [(64+3)/3]=22 rows. This reduction of 11 rows for computing the multiplication product reduces the delay of the multiplier by the speed of at least one gate delay per row.
Illustrated in
FIG. 1C
is a table for illustrating the levels of carry save adders required for K operands using the optimal Wallace tree architecture in the prior art. This table was obtained empirically by drawing tree structures for various word sizes. The Wallace tree summation network utilizes the fewest number of carry save adder delays.
FIG. 1D
depicts a diagram of an example of a Booth-2 (radix 4) multiply with partial products for multiplying two 16 bit numbers. As can be seen by referring to
FIG. 1D
, there are nine rows of partial products to be added together to compute a final product for the two operands. To this end, the partial products form columns of partial product bits, and as known in the art, each of the bits in one column should be added together to produce one of the bits of the product. The least significant bit of the sum of all of the bits in the column represents the product bit for the bit position corresponding to the column. The other bits of the sum are shifted to the adjacent column for inclusion into the summation of the adjacent column. By summing each of the columns in this way, the product can be determined. Note that the additional 1's (“+1”) on the right side of the partial product depicted by
FIG. 1D
are needed to complete the 2's complement for cases when a negative booth multiple is selected.
FIG. 1E
depicts a diagram of an example of a Booth-3 (radix 8) multiply with partial products. As can be seen in this example, there are only six rows of partial products to be added together to compute a final product for the two 16 bit operands. This is accomplished because the radix 8 multiplier generates fewer partial products by generating 3X and 4X multiples. As can be seen in
FIG. 1E
, the partial products generated by the radix 8 multiplier contain an offset of three extra bits per partial product as compared to the partial products generated by the radix 4 multiplier (FIG.
1
D), thereby requiring a larger shift per partial product row. This larger shift per partial product row leads to increased wiring complexity.
FIG. 1F
depicts a block diagram of a prior example of a linear summation array multiplier
7
for partial products. As can be seen, each of the carry save adders (CSA) receives a partial product term (P). Each of the carry save adders also receives a sum (S) and carry (C) term from two previous carry save adders. This is a simple architecture to implement and has a regular structure. The linear summation array multiplier
7
may be utilized to compute a final product for the two operands of FIG.
1
D. The nine rows of partial products (
FIG. 1D
) are added together one bit at a time. Although this structure is one of the simplest and most regular of all known summation structures, it also exhibits one of the highest delays making it impractical for adding a large number of partial products.
FIG. 1G
depicts a block diagram of a prior example of an odd/even summation array
8
for partial products in a multiplier. As can be seen by referring to
FIG. 1G
, each of the carry save adders (CSA) receives a partial product term. Each of the carry save adders also receives a sum and carry term from two previous carry save adders. However, in this odd/even summation implementation, the sum and carry terms from previous carry save adders skip every other row. While this architecture is more complex to implement, it has the advantage of having approximately one-half the number of adder delays as the linear summation array multiplier
7
(FIG.
1
F).
FIGS. 2A and 2B
illustrate an example of conventional linear summation circuitry
30
that may be utilized to add the partial product bits in a column of partial products to produce a bit of the product of two operands. In this regard, the circuitry
30
depicted in
FIGS. 2A and 2B
may be utilized to add a column of partial product bits for up to eighteen rows of partial products. Since the circuitry
30
adds a bit from each row of the radix 4 partial products, the circuitry
30
depicted by
FIGS. 2A and 2B
may add up to 18 bits of information.

Affiliated with

Bass Stephen L

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Colon-Bonet Glenn T

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sullivan Thomas J.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Chaki Kakali

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Do Chat C

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hewlett--Packard Development Company, L.P.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method for increasing performance of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for increasing performance of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for increasing performance of... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3259975

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure