Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
2000-08-14
2003-07-22
Malzahn, David H. (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
Reexamination Certificate
active
06598063
ABSTRACT:
FIELD OF THE INVENTION
This invention is generally related to high accuracy and low latency techniques for computing mathematical expressions of the form (A/B)
K
using a data processor having parallel floating-point arithmetic units in hardware.
BACKGROUND
Most general purpose data processors have only a very basic hardware arithmetic capability, e.g. add, multiply, and divide. Thus, computing a transcendental function such as arctan(x) requires rewriting the function in terms of these basic arithmetic operations, so that the processor can execute the function. Conventional software math libraries contain subroutines, which are typically written in the assembly language of a particular processor, that are optimized to compute a function with high accuracy yet using only the basic arithmetic operations. The methodology of the subroutine is also designed to take advantage of any parallel floating-point processing capability in the processor. For instance, if the rewritten form of the function, sometimes referred to as a series expansion, has multiple instances of the type (A+B), then independent (A+B) parts of the expansion can be placed in two or more instructions that will be executed simultaneously, thereby reducing the latency of computing the function.
Expressions of the form (A/B)
K
, where A and B are real numbers and K is an integer, often need to be computed as part of software-implemented mathematical functions. However, modern machines such as the ITANIUM processor by Intel Corp. do not support the division operation A/B in hardware. The ITANIUM processor supports fused multiply add (FMA) floating-point operations of the form AB+C. In addition, this processor has multiple floating-point units in hardware for parallel instruction execution, and is an example of an explicit parallel instruction computer (EPIC) in which two floating-point arithmetic operations, two memory access operations, and two integer arithmetic operations can be executed in parallel.
A conventional technique for computing (A/B)
K
on a machine such as the ITANIUM processor that does not support division may include the following three steps: (1) reciprocal calculation R=1/B, by first using the well-known approximate reciprocal operator R
0
=frcpa(B)=(1/B)(1+&Dgr;) and then applying an iterative process to refine the approximation R
0
to obtain the needed accuracy in R, (2) quotient calculation Q=AR, and (3) power calculation Q
K
. All three steps can be performed with no divisions, only multiply and add operations.
The overall latency of computing (A/B)
K
is dominated by the first and third steps, i.e. the reciprocal and power calculation steps. In machines that have parallel floating-point units, the power calculation can be optimized to take advantage of such parallelism. However, before the power calculation can be performed, the reciprocal calculation must first be completed, such that it is said to lie in the “critical path” of the overall calculation. This conventional reciprocal calculation is very time consuming, particularly because of the complex iterative procedure needed to enhance the accuracy of the approximate reciprocal. Thus, the combination of the conventional reciprocal calculation in the first step and the power calculation in the third step severely limits the ability to shorten the latency of the overall calculation.
REFERENCES:
patent: 6128638 (2000-10-01), Thomas
patent: 6260056 (2001-07-01), Dalal
patent: 6363407 (2002-03-01), Miyasaka et al.
patent: 6381625 (2002-04-01), Oberman et al.
John Harrison, et al,The Computation of Transcendental Functions on the IA-64 Architecture,Intel Technology Journal Q4, 1999.
Knuth, D.E.,The Art of Computer Programming, vol. 2: Seminumerical Algorithms.Addison-Wesley, 1969, pp. 468-469.
Muller, J.-M.,Elementary Functions: Algorithms and Implementation.Birkhauser 1997, Section 3.7.2, Estrin's Method.
Kubaska Theodore E.
Tang Ping Tak (Peter)
lntel Corporation
Malzahn David H.
LandOfFree
Fast calculation of (A/B)K by a parallel floating-point... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Fast calculation of (A/B)K by a parallel floating-point..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fast calculation of (A/B)K by a parallel floating-point... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3053516