Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
1999-09-16
2003-05-06
Mai, Tan V. (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
C708S497000
Reexamination Certificate
active
06560623
ABSTRACT:
FIELD OF THE INVENTION
The present invention generally relates to performing floating point operations in a floating point unit of a microprocessor, and more specifically to a method for computing correctly rounded function values f(x) of double-precision floating-point numbers within the floating point unit of a microprocessor.
BACKGROUND OF THE INVENTION
Much software is written using numbers which are represented in a floating-point arithmetic format.
FIG. 1
illustrates a format
10
for representing real numbers defined by the standard referred to as IEEE 754 (1984). The representation
10
includes a sign bit
11
, an 11-bit exponent field
12
, and a 52-bit mantissa field
13
. Let s, e and n respectively represent the values of the sign, exponent, and mantissa fields where each of these values are treated as unsigned binary numbers. The format
10
represents a number if e is not 2047. If the exponent e is equal to 2047, the format represents an infinity or a Not-a-Number (NaN) as defined by the IEEE standard. If e and n are both 0, the number represented is the value 0. If e is 0 and n is not zero, the number represented is (−1)
s
2
−1074
n. If e is not zero, the number represented is (−1)
s
2
e−1023
(1+2
−52
n).
While many real numbers can be represented using the format
10
, many arithmetic operations on two representable numbers can produce a result which is not any of the representable numbers (i.e., the number of bits in the result exceed the number of bits in the double precision format of FIG.
1
). For example, the fraction ⅓ cannot be represented to infinite accuracy in the finite number of bits present in the format of FIG.
1
. In such cases, a floating-point arithmetic operation usually returns a representable number which differs from the exact result by performing a rounding operation so that the result may be represented in the finite-sized format of FIG.
1
. The IEEE 754 (1984) standard allows the use of four rounding modes (round to nearest, round toward zero, round toward plus infinity and round toward minus infinity) and desires that a standard result to be obtained in rounding each possible exact real number to a representable real number for the operations of addition, subtraction, multiplication, division and square root. Most computer processor manufacturers implement the standard for double precision, and as a result, arithmetic results from these operations should be identical across a wide range of software platforms, offering interoperability. In addition, in round-to-nearest mode, the rounding produces the representable number closest to the exact result, maintaining a high degree of accuracy. For the operations of addition, subtraction, multiplication, division and square root, the ability to obtain correctly rounded results offers both portability and accuracy.
There is at present no standard ensuring portability and accuracy for the evaluation of properly rounded transcendental functions of numbers which are represented in double precision format. Nobody has been able to determine for a given function f(x) how accurately an approximate value must be computed so that it is possible to determine how to round it correctly in all double precision cases for that function. There are too many double precision numbers to be able to check each one individually by sequential brute force computation, whereas it is possible to check all single precision arguments by brute force computation to ensure proper rounding in all cases for all f(x). For example, to process all double precision results f(x) in a brute force manner to enable the fixing of the precision of an algorithm to an optimal point that ensure correct rounding with optimal performance for all double precision operands, x, one would need to perform about 2
63
evaluations. Using current computing power, these computations would take 292,000 years per each f(x) implemented by a CPU architecture assuming an optimistic rate of 1 f(x) evaluation per microsecond. Clearly, there is no viable method to brute force process all double precision numbers, using today's technology, for all f(x) for a CPU to ensure algorithm optimization that rounds correctly for all x processed by f(x).
To at least guarantee portability, the Java language requires that certain functions return values computed using certain algorithms that are known to return incorrectly rounded results for a large number of arguments. This requirement virtually requires that a compliant implementation of the function use exactly that algorithm, since there is no other practical way to get the specific wrong values on each of the arguments that returns a wrong value. The problem of determining the set of arguments for which it is necessary to get the wrong value is essentially as hard as the problem of finding how much accuracy is needed in order to ensure proper rounding in all cases.
Interval arithmetic is a form of calculation in which the operands of a function are typically intervals between representable numbers and the result of a function is an interval between representable numbers which is guaranteed to be large enough to contain the exact result if each operand is independently drawn from anywhere inside its interval. A long series of interval arithmetic operations usually results in continually growing intervals, as uncertainties in one result lead to even larger uncertainties in results based on them. The interval sizes grow more rapidly than necessary if functions cannot be rounded correctly, since the resulting interval must be large enough to also include the possibility that rounding was performed incorrectly. Correct rounding is therefore important to interval arithmetic.
Shmuel Gal and Boris Bachelis teach a method which is more likely to return a correctly rounded value than other prior art methods that do not address the rounding problem.
FIG. 2
illustrates the fundamental principle taught by Gal and Bachelis. It is well known that all but a few arguments of one of the transcendental functions will have a function value which is not representable exactly in a double precision format. In the Gal and Bachelis method, a function value is computed as the sum of two parts, such that the sum carries more precision than is available in a single double precision number.
FIG. 2
illustrates the Gal and Bachelis method for the case of a round-to-nearest mode of operation. Almost all computed function values will lie between a representable number
20
and the next higher representable number
21
. The real number
22
(not representable in the format of
FIG. 1
since it requires an additional ½ least significant bit (LSB)) lying halfway between the representable numbers
20
and
21
is the critical number for rounding properly in the round-to-nearest mode. If the computed function value f(x) for a value x lies above the line
22
, f(x) is rounded to the representable number
21
, whereas if the computed f(x) lies below the line, f(x) is rounded to the representable number
20
for the round-to-nearest mode. Each computed function value (e.g.,
23
or
24
in
FIG. 2
) has an error associated with it, and
FIG. 2
illustrates a number of error bars for various function values f(x) including the specific values labeled
23
and
24
in FIG.
2
. The central value of each error bar is the computed value and the error bars indicate a guaranteed bound on the error of the computation that is a function of the precision of the f(x) computational algorithm used.
FIG. 2
does not mean to suggest that all values f(x) lie between the same two representable binary numbers. Rather,
FIG. 2
illustrates the superposition of each function value f(x) between the two different representable numbers which surround it.
Gal and Bachelis teach that if the entire error bar lies above or below the line
22
, then the rounding operation described above is guaranteed to be correct. As an example, the error bar specifically labeled as bar
23
in
FIG. 2
, will give the correctly rou
Mai Tan V.
Motorola Inc.
LandOfFree
Method and apparatus for producing correctly rounded values... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for producing correctly rounded values..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for producing correctly rounded values... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3057393