Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1999-11-26
2004-01-20
Chan, Eddie (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S218000, C712S215000
Reexamination Certificate
active
06681322
ABSTRACT:
I. FIELD
The present invention relates to digital computer systems, and more particularly, but not by way of limitation, to methods and apparatus for executing instructions in such systems.
II. BACKGROUND
The Streaming Single-Instruction Multiple-Data Extensions (SSEs) have been developed to enhance the instruction set of the latest generation of certain computer architectures (e.g., the IA-
32
architecture). The SSEs include a new set of registers, new floating point data types, and new instructions. Specifically, the SSEs comprise eight 128-bit single-instruction multiple-data (SIMD) floating point registers (XMM0 through XMM7) that can be used to perform calculations and operations on floating point data. These XMM registers are shown in FIG.
1
A. Each 128-bit floating point register can contain four packed 32-bit single precision (SP) floating point numbers. The structure of the packed 32-bit SP floating point numbers is illustrated in the example of
FIG. 1B
, where four 32-bit SP floating point numbers (numbered
0
through
3
) are shown as if stored in the XMM2 SSE register. In architectures designed to support the SSEs (i.e., its native architecture), a single instruction in the SSE instruction set operates in parallel on the four 32-bit SP floating point numbers in a particular XMM register.
The SSEs also include a status and control register called the MXCSR register. The format of the MXCSR is illustrated in the example of FIG.
1
C. The MXCSR register may be used to selectively mask or unmask exceptions. Specifically, bits
7
-
12
of the MXCSR register may be used by a programmer to selectively mask or unmask a particular exception. Masked exceptions are those exceptions that a programmer wishes to handled automatically by the processor which may provide a default response. Unmasked exceptions, on the other hand, are those exceptions that the programmer wishes to be handled by invocation of an interrupt or operating system handler. This invocation of the handler transfers control to the operating system (e.g., Windows by Microsoft), where the problem may be corrected or the program terminated.
The MXCSR register may also be used to keep track of the status of exception flags. Bits
0
-
5
of the MXCSR register indicate whether any of six exceptions—invalid operation (I), divide-by-zero (Z), denormal operand (D), numeric overflow (O), numeric underflow (U), or inexact result (P)—have occurred in the execution of a SSE instruction. (Note that in the example of
FIG. 1C
, all exception flags have been raised for one reason or another—indicated by “E”). The status flags are “sticky” meaning that once they are set, they are not cleared by any subsequent SSE instruction, even one performed without exception. The status flags can only be cleared by a special instruction usually issued from the operating system.
The exception flags of
FIG. 1C
are the result of a bitwise logical-OR operation on all four of the 32-bit SP floating point operations that are performed on a particular 128-bit register XMM register (one operation on each of the four 32-bit SP floating point numbers). Thus, if an exception occurs as to any one of the four 32-bit SP floating point numbers, the exception flag for that particular type of exception will be raised indicating some type of problem has occurred in the system. The invalid operation (I), divide-by-zero (Z), and denormal operand (D) exceptions are pre-computation exceptions, meaning that they are detected before any arithmetic or logical operations occur (i.e., can be detected without doing any computations), and the other three exceptions, numeric overflow (O), numeric underflow (U), and inexact result (P) are post-computation exceptions meaning that they are detected after operations have been performed. It is possible for an operation performed on a suboperand (i.e., one of the four operands in a 128-bit XMM register) to raise multiple flags.
The native architecture of the SSEs has the following rules for exceptions:
1. When an unmasked exception occurs, the processor executing the instruction will not change the contents of the XMM register. In other words, results will not be committed or stored until it is known that no unmasked exceptions have occurred with respect to any of the four 32-bit SP floating point numbers.
2. If there is a masked exception, all exception flags are updated.
3. In the case of unmasked pre-computation exceptions, all flags relating to pre-computation exceptions, whether masked or unmasked, will be updated. However, no subsequent computations are permitted, meaning that no post-execution exceptions can or will occur. This, of course, means that no post-execution exception flags will change or be updated.
4. In the case of unmasked post-computation exceptions, all post-execution conditions, whether masked or unmasked, will be updated, as will all pre-computation exceptions. Any pre-computation exceptions will be masked exceptions only because, if the pre-computation exception was unmasked, under rule number
3
above, no further computations would have been permitted.
More information regarding Streaming SIMD Extensions may be found in the Intel Architecture Software Developer's Manual, Volumes 1-3, which are hereby incorporated by reference.
In many architectures, provisions have not been made for the SSE instructions. In these non-native architectures, the eight 128-bit floating point XMM registers capable of containing four 32-bit SP floating point numbers are not available. In some non-native architectures, the eight 128-bit XMM registers may be mapped onto sixteen floating point registers (e.g., IA-
64
registers) that may be less than 128 bits and more than 64 bits wide. Specifically, some architectures use 82-bit registers to hold two 32-bit SP floating point numbers (the bits in excess at 64 may be used for the special encoding used to indicate that the register holds SIMD-type 32-bit SP floating point numbers). An example is shown in FIG.
1
D. Note that the four 32-bit SP floating point numbers
0
-
3
stored in the XMM2 register of the SSE native environment (
FIG. 1B
) are now stored in two 82-bit registers, XMM2_Low and XMM2_High, containing the “low half” of the XMM2 register and “high half” of the XMM2 register, respectively. This makes parallel execution of an operation on each of the four 32-bit SP floating point numbers difficult.
Thus, in this non-native environment, the SSE instructions must be executed by emulation. Specifically, operations may first be performed on two of the four 32-bit SP floating point numbers (in parallel) and then be performed on the remaining two 32-bit SP floating point numbers (again, in parallel). (Operations may alternatively be performed on only one or at least three of the 32-bit SP floating point numbers). For example, an operation may be performed on the operands in the “low half,” XMM2_Low, and then on the “high half,” XMM2_High. However, given the SSE rules for handling exceptions and updating exception flags, problems arise when emulating SSE instructions in this partially-parallel, partially-sequential manner. For example, consider a set of instructions being performed on the low half and high half, of FIG.
1
D:
XMM2:=OP(XMM3, XMM4)
emulated by
XMM2_Low:=OP (XMM3_Low, XMM4_Low)
XMM2_High:=OP (XMM3_High, XMM4_High)
Assume that the first instruction is executed without an unmasked exception as to the operands in the low halves, XMM3_Low and XMM4_Low. The results of this operation are then properly committed in XMM2_Low. Assume now that execution of the second instruction on the high halves results in a pre-computation unmasked exception. According to the SSE rules, no subsequent operations are to be performed on any of the four 32-bit SP floating point numbers because of that pre-computation unmasked exception. But here, however, results of the operation on the low halves have been committed to register XMM2_Low in violation of the SSE rules. This corrupts the data in XMM2_Low and cannot be allowed to happen.
One way of succ
Knebel Patrick
Safford Kevin David
Chan Eddie
Harkness Charles
Hewlett--Packard Development Company, L.P.
LandOfFree
Method and apparatus for emulating an instruction set... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for emulating an instruction set..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for emulating an instruction set... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3196498