Method and apparatus for rounding floating point results in...

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06366942

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to processing systems which operate on floating point numbers, more specifically, the invention relates to an efficient mechanism for performing accurate mathematical rounding of such numbers.
BACKGROUND OF THE INVENTION
In digital computing systems, various types of numbers are electronically represented using the binary numbering system. Floating point numbers, such as −1.73491*10
−13
are typically represented in binary using either a VAX or an IEEE floating point standardized format. In either standard, the floating point number is represented as a group of bits divided into three bit fields: a sign bit field, an exponent bit field and a fraction bit field. The sign bit field represents the sign (negative in the above example) of the subject floating point number. The fraction bit field represents the digits surrounding and including the decimal point (i.e., 1.73491 in the above example). Finally, the exponent bit field (e.g. −13 in the above example) represents the multiplier of ten which indicates how many places and in which direction to shift the decimal point in the fraction part of the subject floating point number if it were to be expressed in typical decimal format.
Depending upon the standard in use, there are particular required formats used to represent the fraction and exponent bit fields. In the IEEE standard for normal numbers, the decimal point in the fraction bit field is always assumed to be located just to the right of the most significant bit position. For example, if there are 23 bits in the fraction bit field having bit positions ranging from 0 (rightmost bit) to 22 (leftmost and most significant bit), the decimal point is always assumed to be located between bit positions
22
and
21
. In the VAX standard, the decimal point in the fraction bit field is always assumed to be located just to the left of the most significant bit position (to the left of bit position
22
in the above example). Also, in both the VAX and IEEE standards, a normal fraction value is always stored in a normalized state. A “normalized” fraction bit field always has the most significant non-zero bit located in the most significant (left most) bit position.
All exponents use an excess format, the exponent value is calculated by taking the unsigned value of the exponent bit field and subtracting a bias to produce the true exponent value. A bit field value of 1 represents the most negative true exponent, a bit field value of all one's represents the most positive true exponent, and the bit field value half way between 1 and all one's represents a true exponent value of zero.
The number of bits in the fraction bit field and the number of bits in the exponent bit field determines the precision and range (i.e., the number of significant digits and the maximum and minimum floating point numerical values representable) of a particular floating point format. Both the VAX and IEEE standards provide for single and double precision floating point numbers. Double precision floating point numbers use about twice as many bits for their fraction fields as single precision floating point numbers. A typical single precision floating point number requires a total of 32 bits to store the sign, fraction and exponent fields, while a typical double precision value requires a total of 64 bits for storage.
Various steps must be performed to add two floating point numbers using prior art floating point addition circuits. Before addition can take place, the exponent of the smaller magnitude operand must be adjusted so that it is equal to the exponent of the larger magnitude operand. This is accomplished by incrementing the smaller magnitude operand's exponent while shifting that operand's fraction appropriately such that the value of the combined fraction and exponent is maintained. As an example, if the first and second operands are +0.1234*10
5
and +0.5678*10
7
respectively, to perform the adjustment, the floating point processor adds two to the smaller exponent, i.e., the first operand's exponent (10
5
), to equate it with the exponent of the second operand (10
7
). To maintain the proper value for the smaller magnitude operand, its fraction must be shifted by two decimal places. The combined fraction and exponent becomes +0.001234*10
7
for the adjusted (first) operand.
After the alignment and shift steps are complete, the fraction bit fields (i.e., the fractional values) of the two operands are added in an addition step to produce a result reflecting the sum of the fractions of the operands. In this example, after the addition is complete the resultant sum is +0.569034*10
7
. In some instances, depending upon the value of the resultant sum, the sum may then need to be normalized so that its most significant digit is in the proper decimal position for the resultant format. Normalization is not needed in the above case.
Furthermore, the resultant sum may also exceed the overall precision that can be represented by the floating point standard in use. For example, if the fraction bit field format only has enough bits to represent a precision of four decimal digits to the right of the decimal point, the example resultant fraction value 0.569034 exceeds the allowable precision by two digits. If the precision is exceeded, a rounding step is used to round the fraction up or down to fit within the maximum number of bits allocated for the fraction bit field.
In the VAX floating point standard, there are two rounding modes that can be used, and in the IEEE floating point standard there are four rounding modes that can be used to accomplish the rounding step.
In the IEEE standard, the first rounding mode is called “Round to Nearest Even” (RNE) and rounds values up in magnitude if they are more than half way between two representable results. Values that are exactly half way between two representable results are rounded to a final result that has a least significant fraction bit equal to zero, thus making the result even. Values that are less than halfway between two representable results are rounded down in magnitude (or truncated).
The second and third IEEE rounding modes are called “Round Toward Positive Infinity” (RTPI) and “Round Toward Negative Infinity” (RTNI). In the RTPI rounding mode, values that are between two representable results are rounded up for positive results and down in magnitude for negative results. In the RTNI rounding mode, values that are between two representable results are rounded up in magnitude for negative results and down for positive results.
The fourth IEEE rounding mode is called “Chopped” and rounds all results existing between two representable results down in magnitude by chopping off or eliminating any digits extending beyond the precision (i.e., number of decimal places) allowed.
In the VAX floating point standard, there are only two rounding modes; “Normal Rounding” and “Chopped.” In Normal Rounding, values that are more than or exactly half way between two representable results are rounded up in magnitude. Values that are less than halfway between two representable values are rounded down in magnitude. The Chopped rounding mode in the VAX standard is the same as the IEEE standard and rounds results down in magnitude by chopping off or truncating any bits below the available precision.
Except for the Chopped rounding mode, all rounding modes are accomplished by conditionally incrementing the infinitely precise normalized initial sum at an appropriate bit position, re-normalizing if necessary, and then truncating all bits below the least significant bit position. After the initial normalized sum is computed, the rounding mode in effect determines a specific bit position in the sum at which to increment the result in order to create a fraction bit pattern representing a correctly rounded fraction value. The round increment may cause a carry bit to be propagated to the more significant bit positions in the sum. If the carry due to round increment causes the fractio

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for rounding floating point results in... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for rounding floating point results in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for rounding floating point results in... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2836067

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.