Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Reducing an impact of a stall or pipeline bubble
Reexamination Certificate
1998-01-30
2001-10-23
Chan, Eddie (Department: 2783)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Reducing an impact of a stall or pipeline bubble
C712S228000, C712S234000, C712S217000, C709S241000
Reexamination Certificate
active
06308261
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to computer instructions that are executed in computer systems. More specifically, the present invention relates to a computer instruction that determines whether accessing the contents of a register will cause execution of a program on a computer system to stall, thereby providing an indication of memory latency.
DESCRIPTION OF THE RELATED ART
A typical computer system includes one or more processors, a memory system, and storage media. The memory system typically includes a main memory, and one or more cache memories coupled between the main memory and the processors. The storage media may comprise hard disk drives, floppy disk drives, CD-ROM drives, and the like.
A computer system performs a task by executing a computer program that is comprised of sequences of instructions. The performance of computer systems is greatly affected by the rate at which instructions are executed.
One common type of instruction is a load instruction that retrieves an operand from memory and stores the operand in a register of the processor. If the operand is stored in the cache memory, it will be retrieved quickly. This is known in the art as a “cache hit”. However, if the operand is only available in main memory, it may take considerably longer for the operand to be loaded into the register. This is known in the art as a “cache miss”. Many computer system will continue to execute instructions after a cache miss until encountering an instruction that requires the operand being loaded into the register from memory. Upon encountering such an instruction, the processor will stall execution of that instruction and wait for the load to finish. This technique is known in the art as “stall on use”, and cache memories that support this strategy are known as “non-blocking caches”.
The performance of processors is increasing faster than the performance of main memory systems. Current computer systems may require up to 100 processor cycles to retrieve an operand from main memory, and it is expected that future computer systems will require 1000 or more processor cycles. Accordingly, to achieve optimal system performance it is important to ensure that the processor has other instructions that can be executed while an operand is being retrieved from main memory. Ideally, the processor should never have to stall while waiting for a load to complete.
Some computer systems are capable of executing instructions out-of-order and are therefore able to defer stalling by executing other instructions while waiting for a load to complete. However, out-of-order execution requires complex hardware, and the window of instructions that are eligible to be executed out-of-order is relatively small (no more than a few hundred instructions). In addition, as more logic of a CPU is devoted to supporting out-of-order execution, less logic is available to perform the functions specified by the instructions.
Other techniques allow the processor to switch contexts when it is determined that the current stream of instructions cannot proceed because an operand is not available. For example, it is possible to maintain multiple contexts within the CPU, and simply switch to another context when a stall occurs. However, this approach requires extra hardware for maintaining multiple contexts, and like out-of-order execution, thereby reduces the hardware available to perform the functions specified by the instructions. It is also possible to generate an interrupt upon stalling and have software switch to some other thread or process. While this approach works well for longer latencies, such as retrieving data from a disk, the overhead incurred prohibits this approach when dealing with shorter latencies, such as accessing data from main memory.
Another technique is to model the latencies of various operations involved in executing instructions when a computer program is compiled. By scheduling instructions based on latency models, a compiler can significantly reduce stalling. For example, if a latency model indicates that a cache miss will require 30 CPU cycles, and a compiler can determine that a particular load instruction will generate a cache miss, a compiler can schedule a memory pre-fetch operation to retrieve the operand at least 30 cycles before the operand is needed.
Unfortunately, latencies are often difficult to predict at compile time. A compiler may not be able to determine whether a particular load will produce a cache miss or a cache hit. In addition, a single version of “shrink wrap” software is often marketed for a particular instruction set. However, such software may be executed on a wide variety of computer systems having substantial differences in performance and architecture. For example, a program written for the Intel x86 instruction set may be executed on computer systems ranging from a laptop computer having a Pentium® CPU, no external L2 cache, and fast page mode DRAM memory, to a computer workstation having multiple Pentium® II CPUs (with each CPU having an L1 and L2 cache) and synchronous DRAM memory.
Dynamic variable latencies were addressed by Mark Horowitz, Margaret Martonosi, Todd Mowry, and Michael Smith in a first paper entitled “Informing Loads: Enabling Software to Observe and React to Memory Behavior”, which was published in July of 1995, and in second paper entitled “Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors. Both these papers are hereby incorporated by reference. These papers proposes a class of instructions called “informing load instructions”. An informing load instruction causes execution of the instruction immediately after the informing load instruction to be skipped if the load operation produces a cache hit, and executes the instruction immediately after the informing load instruction if the load operation produces a cache miss. Informing loads are non-blocking loads that do not stall execution. By scheduling a branch instruction immediately after an informing load operation, informing load instructions allow a compiler to schedule alternate action while a cache miss caused by an informing load operation is serviced.
While an informing load operation indicates whether a particular operand is in the first level cache, it does not provide an indication of the magnitude of the latency. The operand may be in an L2 cache, it may be in main memory, or it may be present as a dirty line in the cache of another CPU. Accordingly, informing load operations do not provide compilers with a method of scheduling alternate code threads based on the magnitude of the latency. Nor do informing loads provide a mechanism for a program to confirm that the contents of a register are available after the informing load itself has been executed.
SUMMARY OF THE INVENTION
The present invention is a computer system that includes a data structure that maintains availability status for registers of a processor of the computer system, wherein the availability status indicates whether an instruction attempting to read a particular register will stall. The computer system also includes instruction decode and execution circuitry that is capable of decoding and executing one or more instructions that alter a path of program execution based on the availability status of one or more of the registers.
In one embodiment, a latency probe instruction retrieves the availability status of a register from the data structure and stores the availability status in a register. Thereafter, a conditional branch instruction determines the path of program execution based on the availability status stored in the register. In another embodiment, a conditional branch instruction queries the data structure directly to determine the availability status of a register, and determines the execution path based on the availability status.
The present invention exposes the latency of memory operations to programs, thereby allowing a compiler to schedule alternate threads of execution based on varying latencies. By scheduling alternate threads of execution ba
Hunt Douglas B.
Morris Dale C.
Chan Eddie
Hewlett--Packard Company
Patel Gautam R.
Plettner David A.
LandOfFree
Computer system having an instruction for probing memory... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer system having an instruction for probing memory..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer system having an instruction for probing memory... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2600657