Electrical computers and digital processing systems: processing – Processing control – Arithmetic operation instruction processing
Reexamination Certificate
1999-08-16
2002-10-08
Coleman, Eric (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Arithmetic operation instruction processing
C712S023000, C708S513000
Reexamination Certificate
active
06463525
ABSTRACT:
BACKGROUND
1. Field of Invention
This invention relates generally to systems for processing information and specifically to the implementation of double precision operations using single precision operands.
2. Description of Related Art
A floating point execution unit (FPU) performs arithmetic operations such as addition and subtraction on numerical operands represented in floating point notation. Floating point notation uses a sign, a mantissa, and an exponent to represent integer and fractional numerical values. IEEE standard 754-1985 sets forth acceptable formats for representing decimal numbers in floating point notation in order to ensure uniformity and compatibility between various computer architectures. The IEEE floating point notation formats include single word and double word formats, as well as extended word and other formats. A single word includes 32 bits, typically with 1 bit representing the sign, 8 bits representing the magnitude of the exponent, and 23 bits representing the numeric value of the mantissa, while a double word includes 64 bits, typically with 1 bit representing the sign, 11 bits representing the magnitude of the exponent, and 52 bits representing the numeric value of the mantissa.
Instructions that produce 32-bit results from two 32-bit operands are typically referred to as single precision operations, and instructions which produce 64-bit results from two 64-bit operands are typically referred to as double precision operations. When performing double precision operations, the 64-bit operands may be represented by concatenations of two 32-bit operands aliased from respective 32-bit load operations, rather than from 64-bit load operations. Such aliasing allows architectures having 32-bit loads to implement double precision operations.
FIG. 1
is a block diagram of a 32-bit pipelined processor architecture having a memory
10
, routing logic
20
, a re-order buffer
30
, a register file
40
, a multiplexer (mux)
50
, and a floating point execution unit (FPU)
60
. Memory
10
, which may be a memory cache (e.g., L1 and L2 cache), computer main memory (e.g., DRAM), some suitable external memory (e.g., disk drive), or an appropriate combination of the above, loads 32-bit single precision operands to re-order buffer
30
and/or register file
40
via routing logic
20
. Re-order buffer
30
is a 64-bit wide memory element that typically stores one result per row, i.e., either a 32-bit result or a 64-bit result per row, and is used in a well-known manner to facilitate out-of-order instruction execution. Register file
40
is a 64-bit wide architectural file that stores either two 32-bit operands or one 64-bit operand per row. Register file
40
stores operands upon retirement of corresponding instructions, and is continually updated to maintain current architectural operand values. Mux
50
selectively couples re-order buffer
30
, register file
40
, and/or the result of FPU
60
as input to FPU
60
. FPU
60
is well known and performs arithmetic operations such as addition and subtraction using floating point operands selectively loaded from re-order buffer
30
, register file
40
, or the result of FPU
60
. Typically, each load operation provides data stored in one row of re-order buffer
30
or register file
40
(or the result of FPU
60
) to FPU
60
as an operand.
As mentioned above, 64-bit operands may be represented in the pipeline architecture of
FIG. 1
by concatenating two 32-bit operands.
FIG. 2
illustrates a 2-way single instruction-multiple data (SIMD) single precision instruction which implements a double precision addition operation by executing two single precision operations in parallel. The two single precision operations f
0
+f
2
=f
4
and f
1
+f
3
=f
5
simultaneously execute to implement the double precision instruction fadd %d
0
, %d
2
, %d
4
, where the 64-bit operand d
0
is aliased to 32-bit operands f
0
and f
1
, the 64-bit operand d
2
is aliased to 32-bit operands f
2
and f
3
, and the 64-bit result d
4
is aliased to 32-bit results f
4
and f
5
.
To implement the double precision operation depicted in
FIG. 2
using the architecture of
FIG. 1
, the 32-bit operands f
0
-f
3
are first loaded from memory
10
into corresponding rows of re-order buffer
30
via routing logic
20
. The load operations load the operands f
0
-f
3
into unique rows of re-order buffer
30
. Thus, as shown in
FIG. 3A
, the 32-bit operand f
0
may be loaded into row
0
of re-order buffer
30
in a first clock cycle, the 32-bit operand f
1
may be loaded into row
1
of re-order buffer
30
in a second clock cycle, the 32-bit operand f
2
may be loaded into row
2
of re-order buffer
30
in a third clock cycle, and the 32-bit operand f
3
may be loaded into row
3
of re-order buffer
30
in a fourth clock cycle.
As mentioned above, FPU
60
receives its two operands by selectively loading two rows of re-order buffer
30
or register file
40
(or the result of FPU
60
) using two corresponding load operations. However, the two 64-bit operands required for the double precision instruction are each aliased to two 32-bit operands, which in turn are stored in four separate rows of re-order buffer
30
. Since the four 32-bit operands f
0
-f
3
are stored in separate rows of re-order buffer
30
, as depicted in
FIG. 3A
, and since only two rows of re-order buffer are typically loaded into FPU
60
per instruction, only two of the four 32-bit operands f
0
-f
3
are immediately available from re-order buffer
30
.
Typically, in order to make all four 32-bit operands f
0
-f
3
available to the FPU
60
in two load operations, 32-bit operand pair f
0
-f
1
and pair f
2
-f
3
are written to respective first and second rows of register file
40
upon retirement of the operands. For example, referring to
FIG. 3B
, when retired, operands f
0
and f
1
are written to the first 32 bits and second 32 bits, respectively, of row
0
of register file
40
, and are thereby concatenated to represent 64-bit operand d
0
. Similarly, when retired, operands f
2
and f
3
are written to the first 32 bits and second 32 bits, respectively, of row
1
of register file
40
, and are thereby concatenated to represent 64-bit operand d
2
. Now, the four 32-bit operands f
0
-f
3
aliased to the double precision instruction may be loaded into FPU
60
using two load operations, i.e., by retrieving rows
0
and
1
from register file
40
.
Although concatenation of 32-bit operands within register file
40
advantageously allows for implementation of double precision operations, the concatenated operands d
0
and d
2
are not available until after all four of the 32-bit operands f
0
-f
3
are retired from re-order buffer
30
to register file
40
. The typical latency delay associated with retirement of the four operands f
0
-f
3
is
4
or more clock cycles. Thus, implementation of double precision operations using operands bypassed from single precision operations, as described above, may require 7 or more clock cycles to complete, as summarized below in Table 1.
TABLE 1
command
clock cycle
load % f0
0
load % f1
1
load % f2
2
load % f3
3
fadd % d0, % d2, % d4
7
It would be therefore desirable to implement double precision operations where the operands are bypassed from instructions which produce single precision operands without having to wait for the retirement of older instructions.
SUMMARY
A method is disclosed which allows for implementation of double precision operations having operands bypassed from single precision instructions without having to wait for write-back of the operands to an architectural register file. In one embodiment of the present invention, where pairs of single precision operands are aliased to represent double precision operands, first and second single precision operands are loaded into first and second respective rows of a re-order buffer, and third and fourth single precision operands are loaded into third and fourth respective rows of the re-order buffer. A first merge instruction copies the first and second single
Coleman Eric
Paradice III William L
Sun Microsystems Inc.
LandOfFree
Merging single precision floating point operands does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Merging single precision floating point operands, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Merging single precision floating point operands will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2998369