Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
2000-12-11
2004-07-13
Ingberg, Todd (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
C708S629000, C708S630000
Reexamination Certificate
active
06763367
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to multiplier/accumulator (MAC) circuits, and in particular to a system and method for simplifying the reduction process in a MAC architecture. More particularly, the present invention relates to a MAC architecture having a pre-reduction mechanism for reducing the number of processing stages in the critical MAC path.
2. Description of the Related Art
Multiply-accumulate (MAC) operations are frequently undertaken in computer systems because they lend themselves to multiplying matrices, which is common in digital signal processing and video/graphics applications. An accumulator within a MAC design is a binary computation device comprising an adder/subtractor and an accumulation register. A multiplier is added to the accumulator to form the MAC architecture. Most advanced digital systems employ MAC circuits to perform high-speed parallel multiply/add operations.
Due to its computation-intensive nature and the overhead hardware required, MAC is an costly operation in terms of design expense and speed. Any improvement in the delay for performing a MAC operation would have a positive impact on allowable clock speed, instruction time, and processor performance.
With reference to
FIG. 1
, there is depicted a conventional MAC circuit that includes multiply and accumulate functionality for performing addition and multiplication operations in parallel. Specifically,
FIG. 1
illustrates a MAC circuit
10
for multiplying a 16-bit multiplicand, Y, by a 16-bit multiplicator, X. A pair of registers,
14
and
12
, store multiplicand Y and multiplicator X respectively, until MAC circuit
10
is given a “multiply” instruction whereupon registers
12
and
14
deliver the operands to the multiplication circuitry within MAC circuit
10
.
The multiplication functionality within MAC circuit
10
may be divided into two stages. The first stage includes a partial product generator
16
and a reduction circuit
25
, wherein partial product generation and carry-save addition (reduction) are performed. The second stage includes a carry propagate adder
26
that adds a redundant-form accumulated sum within an accumulation register
33
, to a final, non-redundant accumulated sum within a final accumulation register
28
.
Partial product generator
16
processes multiplicand Y and multiplicator X utilizing a modified Booth algorithm (MBA) to produce a partial product matrix (PPM) such as PPM
32
illustrated in FIG.
3
A. MBA is widely adopted for partial product generation in large (16-bits and greater) multipliers.
As further depicted in
FIG. 1
, partial product generator
16
includes a Booth encoder
20
and Booth multiplexors
18
for implementing the MBA. A major benefit of using a MBA for producing a PPM from partial product generator
16
is that instead of generating n partial products for an n-bit multiplication, the MBA generates approximately half of that.
In accordance with well-known MBA principles, a signed binary number in its 2'-complement form is partitioned into overlapping groups of three bits. By coding each of these groups, an n-bit signed binary number is represented as a sum of n/2 signed digits. Since each signed digit assumes the possible values of 0, ±1, or ±2, the required partial products are all power-of-two multiples of the multiplicand (Y in the depicted example), which are readily available. The logic required to implement the functionality of partial product generator
16
is well-known in the art and is therefore not explicitly depicted in FIG.
1
.
The individual partial product terms (represented as dots within PPM
32
) are generated by partial product generator
16
. With reference to
FIG. 3A
, the result of the Booth recoding performed by partial product generator
16
is evidenced by the eight partial products within PPM
32
for a 16-bit multiplication.
As shown in
FIG. 3A
, PPM
32
forms the upper portion of a reduction array
30
that is applied as an input to a reduction tree within reduction circuit
25
. The lower portion of reduction array
30
comprises the redundant (carry-save) form of an accumulated sum row
38
and an accumulated carry row
37
that were formed in the previous accumulation step and which are added to the partial products within PPM
32
from the current multiplication cycle to form the current redundant accumulator result.
Between PPM
32
and accumulated rows
38
and
37
, an additional row
36
consists of constant values to ensure proper sign extensions for each row of partial products within PPM
32
, and also to set the sign bit of the most significant partial product that is introduced to support two's complementation required for negative partial products in the MBA.
Turning back to
FIG. 1
, reduction circuit
25
includes a compressor array
24
for compressing the bit rows constituting reduction array
30
such that PPM
32
and constant value row
36
are compressed into newly accumulated sum row
38
and carry row
37
. For notational and illustrative convenience with reference to
FIG. 3A
, constant value row
36
, together with the portions of sum and carry rows
38
and
37
enclosed by block
39
may be referred to collectively as block
39
.
With reference to FIG.
1
and
FIG. 3A
, after PPM
32
has been produced by partial product generator
16
, an array of compressor circuits within compressor array
24
are employed to reduce the height of reduction array
30
at its highest point at column
34
, from eleven to two. The compressor array includes circuits for performing carry-save additions in order to compress the eight rows of PPM
32
and constant value row
36
into sum row
38
and carry row
37
. Counter circuits or compressor circuits are utilized to add the rows of PPM
32
, constant value row
36
, and sum-carry rows
38
and
37
into a newly accumulated sum-carry redundant form.
For the examples presented in
FIGS. 1 and 3A
, three concatenated levels of 4:2 compressor circuits are required within compressor array
24
to reduce the eleven rows in reduction array
30
to a final redundant sum-carry result. Such 4:2 compressor circuits can be constructed with three levels of two-input XOR logic gates in the critical path. Each level of logic within compressor array
24
adds to the overhead and delay of MAC circuit
10
.
Compressor array
24
is a hardware intensive component within the architecture of MAC
10
. The number of rows (column depth) is a major limiting factor in minimizing the required number of logic gate levels within compressor array
24
and maximizing the overall speed of MAC
10
. It would therefore be useful to provide a system and method for reducing hardware overhead and accelerating the reduction process in a MAC circuit by pre-reducing the reduction array.
SUMMARY OF THE INVENTION
An apparatus and method for compressing a reduction array into an accumulated carry-save sum are disclosed herein. The reduction array includes a partial product matrix, a carry-save sum, and a constant value row. A compressor array generates a previous accumulated carry-save sum. A three-input/two-output carry-save adder pre-reduces the constant value row and the previously accumulated carry-save sum into a two-row intermediate carry-save sum that is added to the partial product matrix to form a current accumulated carry-save sum.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.
REFERENCES:
patent: 4727507 (1988-02-01), Miyanaga
patent: 4958312 (1990-09-01), Ang et al.
patent: 5042001 (1991-08-01), Brightman et al.
patent: 5220525 (1993-06-01), Anderson et al.
patent: 5751619 (1998-05-01), Agarwal et al.
patent: 5847981 (1998-12-01), Kelley et al.
patent: 5944776 (1999-08-01), Zhang et al.
patent: 6434587 (2002-08-01), Liao et al.
patent: 6519621 (2003-02-01), Yano
patent: 6535901 (2003-03-01), Grisamore
Kwon Oh-sang
Nowka Kevin J.
Dillon & Yudell LLP
Do Chat C.
Ingberg Todd
International Business Machines - Corporation
Salys Casimer K.
LandOfFree
Pre-reduction technique within a multiplier/accumulator... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Pre-reduction technique within a multiplier/accumulator..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pre-reduction technique within a multiplier/accumulator... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3235728