Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Scoreboarding – reservation station – or aliasing
Reexamination Certificate
2000-04-27
2003-10-07
Chan, Eddie (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Scoreboarding, reservation station, or aliasing
C712S228000, C712S216000
Reexamination Certificate
active
06631460
ABSTRACT:
THE FIELD OF THE INVENTION
The present invention generally relates to execution of instructions and performance optimization in computer systems, and more particularly to an advanced load address table (ALAT) for recording memory addresses and register targets for load instructions advanced out of order to achieve improved performance, where entries in the ALAT are invalidated based on register address wraparound.
BACKGROUND OF THE INVENTION
Computer systems include at least one processor and memory. The memory stores program instructions, data, and an operating system. The program instructions can include a compiler for compiling application programs. The operating system controls the processor and the memory for system operations and for executing the program instructions.
A “basic block” is a contiguous set of instructions bounded by branches and/or branch targets, containing no branches or branch targets. This implies that if any instruction in a basic block is executed, all instructions in the basic block will be executed, i.e., the instructions contained within any basic block are executed on an all-or-nothing basis. The instructions within a basic block are enabled for execution when control is passed to the basic block by an earlier branch targeting the basic block (“targeting” as used here includes both explicit targeting via a taken branch as well as implicit targeting via a not taken branch). The forgoing implies that if control is passed to a basic block, then all instructions in the basic block must be executed; if control is not passed to the basic block, then no instructions in the basic block are executed.
The act of executing, or specifying the execution of, an instruction before control has been passed to the instruction is called “speculation.” Speculation performed by the processor at program runtime is called “dynamic speculation” while speculation specified by the compiler is called “static speculation.” Dynamic speculation is known in the prior art. While the vast majority of the prior art is not based on, and does not refer to, static speculation, recently some references to static speculation have begun to surface.
Two instructions are called “independent” when one does not require the result of the other; when one instruction does require the result of the other the instructions are called “dependent.” Independent instructions may be executed in parallel while dependent instructions must be executed in serial fashion. Program performance is improved by identifying independent instructions and executing as many of them in parallel as possible. Experience indicates that more independent instructions can be found by searching across multiple basic blocks than can be found by searching only within individual basic blocks. However, simultaneously executing instructions from multiple basic blocks requires speculation.
Identifying and scheduling independent instructions, and thereby increasing performance, is one of the primary tasks of compilers and processors. The trend in compiler and processor design has been to increase the scope of the search for independent instructions in each successive generation. In prior art instruction sets, an instruction that may generate an exception cannot be speculated by the compiler since, if the instruction causes an exception, the program may exhibit erroneous behavior. This restricts the useful scope of the compiler's search for independent instructions and makes it necessary for speculation to be performed at program runtime by the processor via dynamic speculation. However, dynamic speculation entails a significant amount of hardware complexity that increases exponentially with the number of basic blocks over which dynamic speculation is applied—this places a practical limit on the scope of dynamic speculation. By contrast, the scope over which the compiler can search for independent instructions is much larger—potentially the entire program. Furthermore, once the compiler has been designed to perform static speculation across a single basic block boundary, very little additional complexity is incurred by statically speculating across several basic block boundaries.
There is a need for a mechanism to achieve higher performance in computer systems by enabling execution of as many independent instructions in parallel as possible. This is desirable even when there is a possibility that a second instruction, as well as a calculation dependent thereon, may operate upon data that can be dependent upon the execution of a first instruction.
Many computer systems implement software-controlled register renaming. When a caller procedure calls a callee procedure, local registers of the caller procedure are automatically saved. The caller procedure typically only provides the callee procedure with registers containing input parameters. The callee procedure allocates more registers if the callee procedure requires its own local registers. On a return back to the caller procedure, the local registers of the caller procedure are automatically restored.
The software-controlled register renaming is typically over a large pool of physical registers. On a call, a rename base pointer (i.e., bottom of frame) is incremented by a number of the caller procedure's local registers. On a return, the rename base pointer is decremented by the number of the caller procedure's local registers.
When a series of calls are performed and the number of available physical registers are exhausted, the software-controlled register renaming simply wraps around the bottom of the physical registers. When more physical registers are requested than are available, register values are spilled into memory. Thus, the software-visible frame of registers is mapped onto the physical registers. The physical registers are conceptually arranged in a circle, and as calls are performed, the software-visible frame of registers advances around the conceptual circle of physical registers.
There is a need for a mechanism in computer systems which maximizes the number of independent instructions executed in parallel even when there is a possibility that a second instruction, as well as a calculation dependent thereon, may operate upon data that can be dependent upon execution of a first instruction. In addition, it is desirable that such a mechanism be fully and efficiently compatible with the above-described software-controlled register renaming mechanism.
SUMMARY OF THE INVENTION
The present invention provides a method and a computer system including memory storing a compiled program. The complied program including a store instruction, a load instruction scheduled before the store instruction, and a check instruction. A processor executes the compiled program. Physical registers hold data for the compiled program. A portion of the physical registers form a register stack which wraps around when full. An N-bit current wraparound count state tracks physical register remapping events which cause the register stack to wraparound or unwrap. An advanced load address table (ALAT) has entries corresponding to load instructions. Each entry in the ALAT has at least one memory range field defining a range of memory locations accessed by a corresponding load instruction, a physical register number field corresponding to a physical register accessed in the corresponding load instruction, and an N-bit register wraparound field which corresponds to the N-bit current wraparound count state for the corresponding load instruction. The check instruction accesses the ALAT to determine whether the store instruction and the load instruction potentially accessed a common memory location.
In one embodiment, after the execution of the store instruction, an absence of an entry corresponding to the load instruction in the ALAT indicates that a common memory location may have been accessed by the store and load instructions. In one embodiment, the execution of the store instruction clears the entry in the ALAT corresponding to the load instruction if the store instruction and the load in
Bry William R.
Chen William
Karp Alan H.
Morris Dale C.
Chan Eddie
Harkness Charles
Institute for the Development of Emerging Architectures L.L.C.
LandOfFree
Advanced load address table entry invalidation based on... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Advanced load address table entry invalidation based on..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Advanced load address table entry invalidation based on... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3132988