Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
2001-06-04
2004-08-17
Mai, Tan V. (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
Reexamination Certificate
active
06779013
ABSTRACT:
FIELD
The present invention relates generally to floating point operations, and more specifically to floating point multiply accumulators.
BACKGROUND
Fast floating point mathematical operations have become an important feature in modern electronics. Floating point units are useful in applications such as three-dimensional graphics computations and digital signal processing (DSP). Examples of three-dimensional graphics computation include geometry transformations and perspective transformations. These transformations are performed when the motion of objects is determined by calculating physical equations in response to interactive events instead of replaying prerecorded data.
Many DSP operations, such as finite impulse response (FIR) filters, compute &Sgr;(a
i
b
i
), where i=0 to n−1, and a
i 
and b
i 
are both single precision floating point numbers. This type of computation typically employs floating point multiply accumulate (FMAC) units which perform many multiplication operations and add the resulting products to give the final result. In these types of applications, fast FMAC units typically execute multiplies and additions in parallel without pipeline bubbles. One example FMAC unit is described in: Nobuhiro et al., “2.44-GFLOPS 300-MHz Floating-Point Vector Processing Unit for High-Performance 3-D Graphics Computing,” IEEE Journal of Solid State Circuits, Vol. 35, No. 7, July 2000.
The Institute of Electrical and Electronic Engineers (IEEE) has published an industry standard for floating point operations in the ANSI/IEEE Std 754-1985
, IEEE Standard for Binary Floating
-
Point Arithmetic
, IEEE, New York, 1985, hereinafter referred to as the “IEEE standard.” A typical implementation for a floating point FMAC compliant with the IEEE standard is shown in FIG. 
1
. FMAC 
100
 implements a single precision floating point multiply and accumulate instruction “D=(A×B)+C,” as an indivisible operation. As can be seen from 
FIG. 1
, fast floating point multipliers and fast floating point adders are both important ingredients to make a fast FMAC.
Multiplicands A and B are received by multiplier 
110
, and the product is normalized in post-normalization block 
120
. Multiplicands A and B are typically in an IEEE standard floating point format, and post-normalization block 
120
 typically operates on (normalizes) the output of multiplier 
110
 to make the product conform to the same format. For example, when multiplicands A and B are IEEE standard single precision floating point numbers, post-normalization block 
120
 operates on the output from multiplier 
110
 so that adder 
130
 receives the product as an IEEE standard single precision floating point number.
Adder 
130
 adds the normalized product from post-normalization block 
120
 with the output from multiplexer 
140
. Multiplexer 
140
 can choose between the number C and the previous sum on node 
152
. When the previous sum is used, FMAC 
100
 is performing a multiply-accumulate function. The output of adder 
130
 is normalized in post-normalization block 
150
 so that the sum on node 
152
 is in the standard format discussed above.
Adder 
130
 and post-normalization block 
150
 can be “non-pipelined,” which means that an accumulation can be performed in a single clock cycle. When non-pipelined, adder 
130
 and post-normalization block typically include sufficient logic to limit the frequency at which FMAC 
100
 can operate, in part because floating point adders typically include circuits for alignment, mantissa addition, rounding, and other complex operations. To increase the frequency of operation, adder 
130
 and post-normalization block 
150
 can be “pipelined,” which means registers can be included in the data path to store intermediate results. One disadvantage of pipelining is the introduction of pipeline stalls or bubbles, which decrease the effective data rate through FMAC 
100
.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for fast floating point multiply and accumulate circuits.
REFERENCES:
patent: 5764089 (1998-06-01), Partovi et al.
patent: 5898330 (1999-04-01), Klass
patent: 5900759 (1999-05-01), Tam
patent: 5993051 (1999-11-01), Jiang et al.
patent: 6205462 (2001-03-01), Wyland et al.
patent: 6360189 (2002-03-01), Hinds et al.
patent: 6480872 (2002-11-01), Choquette
Beaumont-Smith, A., et al., “Reduced Latency IEEE Floating-Point Standard Adder Architectures”,Proceedings of the 14th IEEE Symposium on Computer Arithmetic, 8 pgs., (1998).
Even, G., et al., “On the Design of IEEE Compliant Floating Point Units”,IEEE Transactions on Computers, vol. 49, 398-413, (May 2000).
Goto, G., et al., “A 54×54-b Regularly Structured Tree Multiplier”,IEEE Journal of Solid-State Circuits, vol. 27, 1229-1236, (Sep. 1992).
Ide, N., et al., “2.44-GFLOPS 300-MHz Floating-Point Vector-Processing Unit for High-Performance 3-D Graphics Computing”,IEEE Journal of Solid-State Circuits, vol. 35, 1025-1033, (Jul. 2000).
Klass, F., “Semi-Dynamic and Dynamic Flip-Flops with Embedded Logic”,Proceedings of the Symposium on VLSI Circuits, Digest of Technical Papers, Honolulu, HI, IEEE Circuits Soc. Japan Soc. Appl. Phys. Inst. Electron., Inf. & Commun. Eng. Japan, pp. 108-109, (1998).
Lee, K.T., et al., “1 GHz Leading Zero Anticipator Using Independent Sign-Bit Determination Logic”,2000 Symposium on VLSI Circuits Digest of Technical Papers, 194-195, (2000).
Partovi, H., et al., “Flow-Through Latch and Edge-Triggered Flip-Flop Hybrid Elements”,Proceedings of the IEEE International Solid-State Circuits Conference, Digest of Technical Papers and Slide Supplement, NexGen Inc., Milpitas, CA, 40 pgs., (1996).
Elguibaly, F., “A Fast Parallel Multiplier-Accumulator Using the Modified Booth Algorithm”,IEEE Transactions on Circuits and Systems—II : Analog and Digital Signal Processing, 47 (9), pp. 902-908, (Sep. 2000).
Hokenek, E., et al., “Second-Generation RISC Floating Point with Multiply—Add Fused”,IEEE Journal of Solid-State Circuits, 25 (5), pp. 1207-1213, (1990).
Luo, Z., et al., “Accelerating Pipelined Integar and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques”,IEEE Transactions on Computers, 49 (3), 208-218, (Mar. 2000).
Panneerselvam, G., et al., “Multiply-Add Fused RISC Architectures for DSP Applications”,IEEE Pac Rim, pp. 108-111, (1993).
Mai Tan V.
Schwegman, Lundberg, Wosessner & Kluth, P.A.
LandOfFree
Floating point overflow and sign detection does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Floating point overflow and sign detection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Floating point overflow and sign detection will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3344527