Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
2000-02-21
2003-05-27
Ngo, Chuong Dinh (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
Reexamination Certificate
active
06571266
ABSTRACT:
TECHNICAL FIELD
The present invention relates generally to floating point processing. In particular, the present invention relates to a method for adding the input addend in an FMAC (floating-point multiply accumulate) procedure.
BACKGROUND
In the design of microprocessor architecture, three very important considerations are speed, accuracy and cost. While it is desirable to design a microprocessor (CPU) which performs multiplication, addition and other operations with superior accuracy and at a very high rate of speed, it is also desirable to design a CPU which can be cost effectively manufactured. Speed and accuracy have been greatly increased in RISC (reduced instruction set computer) CPUs by fusing multiply and add operations into the multiply accumulate operation (A*B)+C. If it is desired to merely add or multiply two numbers, the operation A*B can be performed by setting C=0, and the operation A+C can be performed by setting B=1. The component of a CPU which performs the (A*B)+C operation is commonly referred to as an FMAC (floating-point multiply accumulate unit) or MAF/FPU (multiply-add-fused floating-point unit).
The inputs to an FMAC are the operands A (multiplicand), B (multiplier) and C (addend), where A, B and C may be fixed or floating-point numbers (floating-point numbers are numbers expressed in scientific notation). The IEEE conventions for representing single-precision (32-bit), double-precision (64-bit), and extended-precision (82 bit) floating-point numbers in binary form is [S, E, M], where S is a single bit representing the sign of a number, E is a multi-bit value (e.g., 17 in extended precision) corresponding to an exponent (which may be offset by a bias), and M is a mantissa (or the fractional portion of a normalized value either stripped of or including its leading
1
). (In this specification, it can be assumed that any expressed mantissa includes its leading
1
.) Thus, the form of a floating-point number is S*M*2
(E-bias)
. With single precision, M is represented by 24 bits; with double-precision, M is represented by 53 bits; and with extended precision, M is represented by 64 bits.
In carrying out the (A*B)+C operation, the FMAC initially multiplies A and B together resulting in an A*B value that is stored in a corresponding register. In most systems, a sufficiently wide data path is used for this multiplication step such that the resulting A*B mantissa has twice the bits of a single valued mantissa. For example, in an extended precision system, the A*B is normally 128 bits wide. The A*B value is then added to the C addend. With particular relevance to the present invention, a crucial step in adding these values involves adding the A*B mantissa to the C mantissa to acquire the mantissa result. U.S. Pat. No. 5,757,686 to Naffziger et al. (which is hereby expressly incorporated by reference into this specification) teaches a valuable method that minimizes the required data-path for performing this mantissa addition step.
The final step in the FMAC process is rounding the final mantissa result to the desired bit position. In extended precision, the mantissa value is rounded to 64 bits. Likewise, with double precision (DP), the value is rounded to 53 bits, and in single precision (SP), the mantissa is rounded to 24 bits. With double and single precision in register file format (which has 64 bit slots for the mantissa), the lower slots (i.e.,
11
and
40
, respectively) are padded with “0”s. Everything else that is rounded off of the resulting mantissa is discarded or used for the rounding calculation, which determines whether or not to increment or decrement the mantissa result.
In conventional rounding techniques, three values (L, G, and S) are typically used to perform the rounding calculation. The least significant bit of the result mantissa (prior to rounding) is the L value. The bit directly to the right of L is the G value, which is known as the guard bit. Everything to the right of G goes into the determination of S, which is known as the “sticky.” The sticky is basically just an OR'ing of each of the bits to the right of G (including those that fall off, as will be addressed below). So, S is basically just a single-bit value, which is true if any bit to the right of G is 1.
FIG. 1
shows an FMAC
20
for adding the mantissa's of A*B and C. A, B, and C
22
,
24
,
30
, each comprising an m-bit mantissa and an exponent, and the (A*B) result
28
comprising a “2m+1”-bit mantissa and an exponent, is described herein. FMAC
20
generally comprise a CHI register
32
, coupled to means for transferring bits of the C mantissa
30
which exceed a range of the (A*B) mantissa
28
to the Result
44
; a CBUS register
36
, coupled to an alignment shifter
34
for placing bits of the C mantissa
30
which overlap the range of the (A*B) mantissa
28
into the CBUS register
36
, the overlapping bits contained within the (A*B) and CBUS registers
28
,
36
being aligned for adding; an adder
38
, coupled to the (A*B) and CBUS registers
28
,
36
, and providing an (A*B)+CBUS output
46
; a leading bit anticipator (LBA)
40
connected to the (A*B)+CBUS output
46
of the adder
38
; a normalization shifter
42
, coupled to the leading bit anticipator
40
, and providing a normalized temporary output
48
; and means for merging those bits of the CHI register
32
comprising bits of the C mantissa
30
which exceed the range of the (A*B) mantissa
28
with one or more most significant bits of the temporary output
48
to produce an (A*B)+C accumulate output
44
.
With reference to
FIGS. 2-5
, respectively, there are four general cases for adding the A*B and C mantissas based on the relative differences of their associated exponents. The first case is when EXP(C) is sufficiently less than EXP(AB) such that the mantissas do not overlap when being added. The second case is when EXP(C) is less than EXP(AB), but they overlap such that there is some interaction when they are added. The third case is when EXP(C) is greater than EXP(AB), but they overlap such that they interact when added. Finally, the fourth case, which is referred to as the “Big C” case, occurs when EXP(C) is sufficiently larger than EXP(AB) such that they do not overlap and there is no interplay when they are added. After an (A*B) result
28
is created, the magnitude of the (A*B)
28
and C
30
exponents are compared to determine which of four possible cases exists.
FIGS. 2-5
show the adding methods for the above-described four cases. The method generally comprises the steps of comparing the exponents of (A*B)
28
and C
30
to determine whether there is an overlapping range of the (A*B)
28
and C
30
mantissas; transferring any part of the C mantissa
30
which exceeds a range of the (A*B) mantissa
28
to a CHI register
32
; shifting any part of the C mantissa
30
A that overlaps the range of the (A*B) mantissa
28
in the C bus
36
and zeroing out the remaining portion of C Bus
36
, so as to align the bits of the (A*B)
28
and C
30
mantissas according to their respective magnitudes; if not in case
4
, adding the C Bus
36
to the (A*B) mantissa
28
to generate a temporary result
46
; if a portion of the C mantissa was transferred to the CHI register
32
, right shifting the Temp. Result
46
such that one or more least significant bits corresponding to a number of bits transferred to the CHI register
32
out of the temporary result to generate a Shifted Temp. Result
48
; and merging with a merge mask
47
the bits of the CHI register
32
, which correspond to the C mantissa exceeding the A*B mantissa, with one or more most significant bit positions of the shifted temporary result
48
to generate a final accumulate result
44
.
With reference to
FIG. 2
, case one will now be considered. In case one (FIG.
2
), EXP(C) is sufficiently less than EXP(AB) such that when their exponents are compared and their mantissas aligned, there is no overlapping between AB
28
and C
30
. This results with no contributi
Do Chat C
Hewlett--Packard Development Company, L.P.
Ngo Chuong Dinh
LandOfFree
Method for acquiring FMAC rounding parameters does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for acquiring FMAC rounding parameters, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for acquiring FMAC rounding parameters will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3014893