Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
2001-06-04
2004-08-17
Mai, Tan V. (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
Reexamination Certificate
active
06779013
ABSTRACT:
FIELD
The present invention relates generally to floating point operations, and more specifically to floating point multiply accumulators.
BACKGROUND
Fast floating point mathematical operations have become an important feature in modern electronics. Floating point units are useful in applications such as three-dimensional graphics computations and digital signal processing (DSP). Examples of three-dimensional graphics computation include geometry transformations and perspective transformations. These transformations are performed when the motion of objects is determined by calculating physical equations in response to interactive events instead of replaying prerecorded data.
Many DSP operations, such as finite impulse response (FIR) filters, compute &Sgr;(a
i
b
i
), where i=0 to n−1, and a
i
and b
i
are both single precision floating point numbers. This type of computation typically employs floating point multiply accumulate (FMAC) units which perform many multiplication operations and add the resulting products to give the final result. In these types of applications, fast FMAC units typically execute multiplies and additions in parallel without pipeline bubbles. One example FMAC unit is described in: Nobuhiro et al., “2.44-GFLOPS 300-MHz Floating-Point Vector Processing Unit for High-Performance 3-D Graphics Computing,” IEEE Journal of Solid State Circuits, Vol. 35, No. 7, July 2000.
The Institute of Electrical and Electronic Engineers (IEEE) has published an industry standard for floating point operations in the ANSI/IEEE Std 754-1985
, IEEE Standard for Binary Floating
-
Point Arithmetic
, IEEE, New York, 1985, hereinafter referred to as the “IEEE standard.” A typical implementation for a floating point FMAC compliant with the IEEE standard is shown in FIG.
1
. FMAC
100
implements a single precision floating point multiply and accumulate instruction “D=(A×B)+C,” as an indivisible operation. As can be seen from
FIG. 1
, fast floating point multipliers and fast floating point adders are both important ingredients to make a fast FMAC.
Multiplicands A and B are received by multiplier
110
, and the product is normalized in post-normalization block
120
. Multiplicands A and B are typically in an IEEE standard floating point format, and post-normalization block
120
typically operates on (normalizes) the output of multiplier
110
to make the product conform to the same format. For example, when multiplicands A and B are IEEE standard single precision floating point numbers, post-normalization block
120
operates on the output from multiplier
110
so that adder
130
receives the product as an IEEE standard single precision floating point number.
Adder
130
adds the normalized product from post-normalization block
120
with the output from multiplexer
140
. Multiplexer
140
can choose between the number C and the previous sum on node
152
. When the previous sum is used, FMAC
100
is performing a multiply-accumulate function. The output of adder
130
is normalized in post-normalization block
150
so that the sum on node
152
is in the standard format discussed above.
Adder
130
and post-normalization block
150
can be “non-pipelined,” which means that an accumulation can be performed in a single clock cycle. When non-pipelined, adder
130
and post-normalization block typically include sufficient logic to limit the frequency at which FMAC
100
can operate, in part because floating point adders typically include circuits for alignment, mantissa addition, rounding, and other complex operations. To increase the frequency of operation, adder
130
and post-normalization block
150
can be “pipelined,” which means registers can be included in the data path to store intermediate results. One disadvantage of pipelining is the introduction of pipeline stalls or bubbles, which decrease the effective data rate through FMAC
100
.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for fast floating point multiply and accumulate circuits.
REFERENCES:
patent: 5764089 (1998-06-01), Partovi et al.
patent: 5898330 (1999-04-01), Klass
patent: 5900759 (1999-05-01), Tam
patent: 5993051 (1999-11-01), Jiang et al.
patent: 6205462 (2001-03-01), Wyland et al.
patent: 6360189 (2002-03-01), Hinds et al.
patent: 6480872 (2002-11-01), Choquette
Beaumont-Smith, A., et al., “Reduced Latency IEEE Floating-Point Standard Adder Architectures”,Proceedings of the 14th IEEE Symposium on Computer Arithmetic, 8 pgs., (1998).
Even, G., et al., “On the Design of IEEE Compliant Floating Point Units”,IEEE Transactions on Computers, vol. 49, 398-413, (May 2000).
Goto, G., et al., “A 54×54-b Regularly Structured Tree Multiplier”,IEEE Journal of Solid-State Circuits, vol. 27, 1229-1236, (Sep. 1992).
Ide, N., et al., “2.44-GFLOPS 300-MHz Floating-Point Vector-Processing Unit for High-Performance 3-D Graphics Computing”,IEEE Journal of Solid-State Circuits, vol. 35, 1025-1033, (Jul. 2000).
Klass, F., “Semi-Dynamic and Dynamic Flip-Flops with Embedded Logic”,Proceedings of the Symposium on VLSI Circuits, Digest of Technical Papers, Honolulu, HI, IEEE Circuits Soc. Japan Soc. Appl. Phys. Inst. Electron., Inf. & Commun. Eng. Japan, pp. 108-109, (1998).
Lee, K.T., et al., “1 GHz Leading Zero Anticipator Using Independent Sign-Bit Determination Logic”,2000 Symposium on VLSI Circuits Digest of Technical Papers, 194-195, (2000).
Partovi, H., et al., “Flow-Through Latch and Edge-Triggered Flip-Flop Hybrid Elements”,Proceedings of the IEEE International Solid-State Circuits Conference, Digest of Technical Papers and Slide Supplement, NexGen Inc., Milpitas, CA, 40 pgs., (1996).
Elguibaly, F., “A Fast Parallel Multiplier-Accumulator Using the Modified Booth Algorithm”,IEEE Transactions on Circuits and Systems—II : Analog and Digital Signal Processing, 47 (9), pp. 902-908, (Sep. 2000).
Hokenek, E., et al., “Second-Generation RISC Floating Point with Multiply—Add Fused”,IEEE Journal of Solid-State Circuits, 25 (5), pp. 1207-1213, (1990).
Luo, Z., et al., “Accelerating Pipelined Integar and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques”,IEEE Transactions on Computers, 49 (3), 208-218, (Mar. 2000).
Panneerselvam, G., et al., “Multiply-Add Fused RISC Architectures for DSP Applications”,IEEE Pac Rim, pp. 108-111, (1993).
Mai Tan V.
Schwegman, Lundberg, Wosessner & Kluth, P.A.
LandOfFree
Floating point overflow and sign detection does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Floating point overflow and sign detection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Floating point overflow and sign detection will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3344527