Data processing: structural design – modeling – simulation – and em – Modeling by mathematical expression
Reexamination Certificate
1998-08-31
2002-03-19
Frejd, Russell W. (Department: 2123)
Data processing: structural design, modeling, simulation, and em
Modeling by mathematical expression
C708S501000, C708S523000
Reexamination Certificate
active
06360189
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a data processing apparatus and method for performing multiply accumulate operations.
2. Description of the Prior Art
It is common for data processing apparatus to be required to perform various floating point computations on data. To ensure a consistent approach in the way in which such floating point computations are handled by various data processing apparatus, a standard was produced in 1985 called the “IEEE Standard for Binary Floating-Point Arithmetic”, ANSI/IEEE Std 754-1985, The Institute of Electrical and Electronic Engineers, Inc., New York, 10017 (hereafter referred to as the IEEE 754-1985 standard). This standard defined, amongst other things, that a multiplication operation should finish with a rounding operation, and similarly that an add, or accumulate, operation should finish with a rounding operation. The IEEE 754-1985 standard further provided a definition of a number of rounding operations which would be considered to be compliant with the IEEE 754-1985 standard.
It has been found that general purpose processors are not well suited to the performance of floating point computations, and hence this has led to the development of specialised floating point units (FPUs) to handle such computations.
One particular floating point computation which is commonly required is a multiply-accumulate operation, whereby two numbers are multiplied together, and the product is then added to a third number. Multiply-accumulate operations were not specifically discussed in the IEEE 754-1985 standard, but rather multiplication and accumulate operations were discussed separately. Although a multiply-accumulate operation can be performed by executing a multiplication instruction followed by a separate accumulate instruction, such an approach is relatively slow.
Hence, there has been a great deal of interest in developing FPUs arranged specifically to perform multiply-accumulate operations with increased speed. An example of such a FPU is disclosed in U.S. Pat. No. 4,969,118, which describes a FPU developed by IBM to perform a multiply-accumulate operation. In accordance with the IBM technique, a partial multiplier produces a partial product of two numbers, and this partial product is then passed to adder circuitry for adding to a third number. Hence, the multiply-accumulate operation is ‘fused’, in that the result of the multiplication is not independently determined prior to the accumulate operation. This approach significantly increases the speed of the multiply-accumulate operation.
Further, the multiplication is performed to an internal precision which contains all of the bits from the multiplication (for an n×n bit multiplication the result is 2 n bits) and the accumulation is then performed using all of the multiply bits. This provides a particularly accurate result, since no rounding is performed on the result of the multiplication before that result is used in the subsequent accumulation. However, it is apparent that this technique is not compliant with the IEEE 754-1985 standard since that standard defines that a rounding operation should be performed on the result of the multiplication.
Other examples of FPUs designed specifically to increase the speed of a multiply-accumulate operation and/or reduce circuit complexity can be found in U.S. Pat. Nos. 5,241,493, 5,375,078, 5,530,663 and EP-A-0,645,699, U.S. Pat. Nos. 4,866,652 and 4,841,467. None of these documents are concerned with the issue of rounding, and in particular none are concerned with producing results which are compliant with the IEEE 754-1985 standard.
An alternative approach used in the MIPS R10000 product is to retain the multiplier and adder as separate logic units. When performing a multiply-accumulate operation, rounding is applied to the output of the multiplier unit, and this output is then input to the adder logic unit, with the result of the adder logic unit also being rounded. Whilst this enables an IEEE 754-1985 compliant result to be achieved for a multiply-accumulate operation, it does not retain the speed benefits to be obtained from a specialised logic unit arranged specifically to perform multiply-accumulate operations.
It is an object of the present invention to provide a data processing apparatus and method for efficiently performing a multiply-accumulate operation in response to a single instruction whilst producing a result which is equivalent to the execution of a separate multiply instruction incorporating rounding, followed by a separate add instruction incorporating rounding.
SUMMARY OF THE INVENTION
Accordingly, the present invention provides a data processing apparatus for performing a multiply-accumulate operation A+(B*C) in response to a single instruction identifying said multiply-accumulate operation, comprising: a multiplier for multiplying values B and C to generate an unrounded multiplication result, the multiplier further being arranged to generate first data required for rounding determination; an adder for adding the unrounded multiplication result to a value A to generate an unrounded multiply-accumulate result, the adder further being arranged to generate second data required for rounding determination; determination logic for using the first and second data to determine one or more rounding values required to produce a final multiply-accumulate result equivalent to the execution of a separate multiply instruction incorporating rounding, followed by a separate add instruction incorporating rounding; and rounding logic for applying the one or more rounding values to generate the final multiply-accumulate result.
In accordance with the present invention, a multiplier is provided to generate an unrounded multiplication result by multiplying values B and C, and further to generate first data required for rounding determination. Similarly, an adder is provided to generate an unrounded multiply-accumulate result by adding a value A to the unrounded multiplication result, and further to generate second data required for rounding determination. Determination logic is then arranged to use the first and second data to determine one or more rounding values required to produce a final multiply-accumulate result equivalent to the execution of a separate multiply instruction incorporating rounding, followed by a separate add instruction incorporating rounding.
By this approach, dedicated multiply-accumulate logic can be provided to enable fast execution of a multiply-accumulate instruction, whilst producing a result which is compliant with the IEEE 754-1985 standard.
In one embodiment, the determination logic can be arranged to determine more than one rounding value to be applied at appropriate steps during the addition operation. However, in preferred embodiments, the determination logic is arranged to generate a single rounding value to be applied by the rounding logic to the unrounded multiply-accumulate result to generate the final multiply-accumulate result. This approach yields further speed improvements in the execution of the multiply-accumulate operation.
In preferred embodiments, the first data generated by the multiplier comprises guard and sticky bits, and the determination logic comprises first logic for determining a multiplier rounding value from the first data. Further, the first data preferably comprises one or more least significant bits of the multiplication result, which are also used in the generation of the multiplier rounding value. In such preferred embodiments, the determination logic further comprises second logic for determining the one or more rounding values from the multiplier rounding value and the second data.
In preferred embodiments, the adder comprises an alignment shifter for aligning the smaller of the value A and the multiplication result prior to performing the addition, and a detection unit for detecting whether the bits shifted out by the alignment shifter are all ones or all zeros. In this embodiment, the second data generated by the adder prefer
Hinds Christopher Neal
Jaggar David Vivian
Matheny David Terrence
ARM Limited
Frejd Russell W.
LandOfFree
Data processing apparatus and method for performing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Data processing apparatus and method for performing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data processing apparatus and method for performing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2850604