Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2000-09-29
2004-07-06
Kim, Hong (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S213000, C711S133000, C712S207000, C712S237000
Reexamination Certificate
active
06760816
ABSTRACT:
FIELD OF THE INVENTION
This invention pertains to prefetching data into a cache on a computer, and more particularly to prefetching data that is identified as critical.
BACKGROUND OF THE INVENTION
When computers first became available, they ran programs by executing instructions using in-order execution. Before instruction number two could be executed, instruction number one had to complete. Since clock speeds were relatively slow, this was not a significant issue. The processor could not execute complicated instructions much faster than any other part of the computer could support the instruction. But modern processors are much more efficient than their ancestors were. Modern computers are capable of running at very high clock rates and may perform complicated instructions in very few clock cycles.
But while processor clock speeds have increased dramatically, improvements in other parts of the computer have been less significant. Specifically, at the high clock rates in modem processors, it may take thousands of clock cycles to access data from memory. In an in-order instruction processor, the processor must wait for a memory access to complete before it may continue with the next instruction. This may cause significant delay in program execution. To deal with this problem, processors began to run programs using out-of-order execution. While one complicated instruction is delayed (for example, due to a memory access), other instructions that do not depend on the delayed instruction may be executed. For out-of-order execution to work, the processor needs to be able to do several things. First, the processor determines whether a later instruction is dependent on the delayed instruction. For example, consider the situation where a value is loaded from memory into a register in the processor. If a later instruction adds the value in that register to another value in another register, this later instruction is dependent on the delayed instruction: it may not execute until after the load instruction completes. On the other hand, an add instruction that adds two registers that are totally unrelated to the load instruction may be executed while the value is loaded from memory, even though the exact instruction order suggests that this add instruction should not execute yet.
Second, the processor buffers any dependent instructions for later execution. If the processor detects that a later instruction is dependent on a delayed load instruction, the later instruction may not be executed out-of-order, and is buffered until after the load instruction completes.
Third, the processor renames registers. A register may be renamed when a later instruction that is not dependent on the delayed load instruction uses a register that is used by the delayed load instruction. In this case, the processor needs to be able to rename the register used by the later instruction so that the “parallel execution” of the load instruction and the later instruction does not create a conflict.
FIG. 1
 shows how a processor in the prior art operates. Processor 
105
 receives instruction sequence 
110
. While a load instruction is pending, processor 
105
 examines later instructions to see if they are dependent on the delayed load instruction. If the later instruction is dependent on the delayed load instruction, the later instruction is buffered in buffer 
115
. Otherwise, the later instruction may be executed out-of-order, and joins executed instructions 
120
.
Two concerns may arise that limit the effectiveness of out-of-order execution. First, processor 
105
 may fill buffer 
115
 with dependent instructions. Once the buffer is full, processor 
105
 may not add any more instructions to buffer 
115
, and all later instructions have to wait until the delayed load instruction completes. Second, the program may include a branch instruction after the load instruction. Even with branch prediction, processor 
105
 may not execute the instructions without some way to reverse the process in case the branch prediction was incorrect. Typically, processor 
105
 will simply buffer the instructions rather than execute and risk having to rewind the program execution.
The problems with out-of-order execution are exacerbated by the possibility of multiple load instructions within a relatively small block of instructions. With multiple independent load instructions, if the load instructions are executed in their original order, the processor may be more inefficient than it needs to be.
Other problems related to load instruction delays have to do with caching. Cache lines containing data requested by load instructions may be removed from the cache even though other nearby data will be requested shortly. And cache lines containing data that may be loaded shortly may not be fetched into the cache in advance of their need.
The present invention addresses these and other problems associated with the prior art.
REFERENCES:
patent: 4888679 (1989-12-01), Fossum et al.
patent: 5109498 (1992-04-01), Kamiya et al.
patent: 5146578 (1992-09-01), Zangenehpour
patent: 5214766 (1993-05-01), Liu
patent: 5261053 (1993-11-01), Valencia
patent: 5317718 (1994-05-01), Jouppi
patent: 5377336 (1994-12-01), Eickemeyer et al.
patent: 5497499 (1996-03-01), Garg et al.
patent: 5664147 (1997-09-01), Mayfield
patent: 5737624 (1998-04-01), Garg et al.
patent: 5758119 (1998-05-01), Mayfield et al.
patent: 5761706 (1998-06-01), Kessler et al.
patent: 5822764 (1998-10-01), Hardage, Jr. et al.
patent: 5838945 (1998-11-01), Emberson
patent: 5974526 (1999-10-01), Garg et al.
patent: 6000007 (1999-12-01), Leung et al.
patent: 6073215 (2000-06-01), Snyder
patent: 6085289 (2000-07-01), Thatcher et al.
patent: 6131145 (2000-10-01), Matsubara et al.
patent: 6134643 (2000-10-01), Kedem et al.
patent: 6138212 (2000-10-01), Chiacchia et al.
patent: 6167509 (2000-12-01), Sites et al.
patent: 6195735 (2001-02-01), Krueger et al.
patent: 6223256 (2001-04-01), Gaither
patent: 6263404 (2001-07-01), Borkenhagen et al.
patent: 6289433 (2001-09-01), Garg et al.
patent: 6360297 (2002-03-01), Arimilli et al.
patent: 6381679 (2002-04-01), Matsubara et al.
patent: 2001/0027515 (2001-10-01), Ukai et al.
patent: 2002/0087794 (2002-07-01), Jouppi et al.
patent: 2002/0156962 (2002-10-01), Chopra et al.
IBM Technical Disclosure Bulletin, Prefetching With Invalid Cache Entries, Aug. 1, 1990, VOL: 33, Issue: 3B, Page: 46.*
Kumar, Sanjeev and Wilkerson, Christopher; “Exploiting Spatial Locality in Data Caches using Spatial Footprints”;Proceedings of 25thAnnual ACM/IEEE International Symposium on Computer Architecture; (ISCA '98); pp. 1-12.
Ju Dz-ching
Srinivasan Srikanth T.
Wilkerson Christopher B.
Intel Corporation
Kim Hong
Marger Johnson & McCollom
LandOfFree
Critical loads guided data prefetching does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Critical loads guided data prefetching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Critical loads guided data prefetching will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3207226