Sums of production datapath

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06438569

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to the field of logic devices. More specifically, the invention relates to the field of programmable logic devices.
2. Background Information
Fast multiplication and addition are key arithmetic operations in digital signal processing (DSP), as well as other forms of computer data processing. In DSP especially, it is often necessary to multiply several pairs of numbers and accumulate the results by addition into a single number. Mathematically, this operation is called a “dot product” or “sum of products.” It can be written a
1
*b
1
+a
2
*b
2
+ . . . +a
n
*b
n
, where the a
i
and b
i
sequences are paired up, and each corresponding element is multiplied, with the results accumulated. In a typical digital filter, the first sequence may be a fixed sequence of filter coefficients, while the second sequence may be a contiguous set of data samples from a longer input sequence.
Some application-specific examples of constant coefficient sums of products over low resolution data include: multi-tap constant coefficient FIR filters such as pulse shaping filters, especially over low precision data; fixed-pattern correlators (matched filters), despreaders, etc. Other applications that involve higher precision data include: fast block transforms, such as DCT, FFT, wavelets, etc.; IP checksums (reduction of 64 bits/cycle in the main body of an IP packet), etc.
The precision requirements for these multiplication and addition operations can vary tremendously, as can the desired representations of the numbers involved. For example, in some applications it is desired to use floating point number representations; in others, the fixed point representation is sufficient and is more cost effective. Among fixed point representations, the number of integral and fractional digits can vary, as can the total number of digits. Additionally, the numbers may be signed or unsigned. Beyond the data representations themselves, certain details of the processing operations are important. For example, multiplication and addition operations produce outputs with a greater number of digits than their inputs. Thus, when such operations are composed, the number of digits in the results can grow dramatically. Commonly, the exact results include digits that do not represent useful information, so some digits are discarded using truncation and rounding. The art of discarding digits that are not useful is both important and complex.
The precision requirements for the multiplication and addition operations are generally related to: the precision of the input data; the precision of the coefficients; the type of processing algorithm; and certain parameters of that algorithm such as how truncation and rounding are performed. The analysis of these requirements is sufficiently complex that a whole branch of mathematics, known as Numerical Analysis, has been developed for them.
In response to the widespread need for fast multiplication and addition with a variety of precisions and data representations, an extensive literature has been created and many hardware and software implementations have been developed. For most implementations, the complexity increases roughly as N*M where N and M are the number of bits of the two input operands. Thus, for N by N multiplication, the complexity increases as N
2
. Algorithms are known that reduce this complexity for very large operands, but for most applications, the operand sizes are not large enough to make these algorithms practically useful. On the other hand, many ideas have been developed that do effectively exploit properties of hardware technologies and multiplication algorithms to speed up implementations having a particular precision and numerical representation.
The straightforward approach to multiplication is adding up a set of appropriately shifted partial products, each generated by multiplying the multiplicand by one of the digits of the multiplier. The only difficulty about addition is carries between digits, since the carry out from a particular digit depends on the carry into that digit, so that the carry propagation aspect of addition is inherently sequential. Since it is possible that a carry may propagate across all the digits of a sum, the number of sequential steps required for the addition is equal to the number of digits being added. Many techniques are known for reducing the maximum number of sequential steps requires for the addition; however these techniques generally require more hardware.
Many hardware designs for fast multiplication embody an extended version of the straightforward multiplication algorithm, consisting of a first part that generates partial products, a second part that sums the partial products to two numbers (referred to as “carry” or “C” and “save” or “S”) whose sum is the correct answer, and a third part that adds together C and S to produce the answer. The partial product generation may include any form of multiplicand preprocessing, such as Booth encoding. The numbers C and S are developed in such a way that carry propagation is largely or completely avoided during the second part. The apparatus implementing the second part is generally known as a “Carry Save Adder,” sometimes abbreviated “CSA.” Carry propagation is unavoidable during the third part of the multiplication algorithm, but only two numbers are then involved, and any of the known techniques can be used to speed up the addition. The third part of this multiplication algorithm is also called the “Carry Propagate Adder,” sometimes abbreviated “CPA”.
The variations among hardware multiplier designs of this type generally involve one or more of the following: the method for generating partial products, the method for reducing them to numbers C and S, the method for performing the final addition of numbers C and S, and the method for modifying the partial products and/or carry save adder to accommodate signed number representations.
More generally, systems applications may use several of the DSP algorithms that were just briefly described, and may use other algorithms involving multiplications and additions as well. Depending on the total throughput required by the application, it may be necessary to provide dedicated hardware multiplication and addition circuits for each operation through which data flows in fixed connection patterns, or on the other hand, it may be possible to reuse one or more hardware multiplication and addition circuits with data flows directed by a control element.
When designing dedicated hardware for performing sums of products, the hardware designer will exploit certain simplifications resulting from zero bits in the (constant) coefficients. A designer/architect will use other techniques to minimize the number of non-zero bits in the coefficients. For example, with 1-bit data interpreted as 1 and −1, an FIR filter reduces to adding and subtracting the coefficients according to the state of each data bit. The result is that multiplications involving constant coefficients are much more efficient in dedicated hardware as opposed to “general purpose” multiplication hardware. However, programmable CPU and DSP hardware typically provides only general purpose multiplies.
In particular, CPU and DSP circuits usually implement complex numerical algorithms by the sequential composition of simpler operations into and out of register files that store intermediate results, coefficients, and so on. For example, a sum of squares of differences algorithm may be implemented by a first operation that takes the difference of two numbers, a second operation that squares the result, and a third operation that accumulates the result of the second operation into a running sum. In case each operation takes a single cycle, the algorithm would then be completed in 3 cycles.
However, since the inherent complexity of multiplication is higher than that of addition or subtraction, hardware designers often optimize the clock speed of their designs by pipelining the multiplication operation,

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Sums of production datapath does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Sums of production datapath, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Sums of production datapath will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2973295

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.