Critical loads guided data prefetching

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S213000, C711S133000, C712S207000, C712S237000

Reexamination Certificate

active

06760816

ABSTRACT:

FIELD OF THE INVENTION
This invention pertains to prefetching data into a cache on a computer, and more particularly to prefetching data that is identified as critical.
BACKGROUND OF THE INVENTION
When computers first became available, they ran programs by executing instructions using in-order execution. Before instruction number two could be executed, instruction number one had to complete. Since clock speeds were relatively slow, this was not a significant issue. The processor could not execute complicated instructions much faster than any other part of the computer could support the instruction. But modern processors are much more efficient than their ancestors were. Modern computers are capable of running at very high clock rates and may perform complicated instructions in very few clock cycles.
But while processor clock speeds have increased dramatically, improvements in other parts of the computer have been less significant. Specifically, at the high clock rates in modem processors, it may take thousands of clock cycles to access data from memory. In an in-order instruction processor, the processor must wait for a memory access to complete before it may continue with the next instruction. This may cause significant delay in program execution. To deal with this problem, processors began to run programs using out-of-order execution. While one complicated instruction is delayed (for example, due to a memory access), other instructions that do not depend on the delayed instruction may be executed. For out-of-order execution to work, the processor needs to be able to do several things. First, the processor determines whether a later instruction is dependent on the delayed instruction. For example, consider the situation where a value is loaded from memory into a register in the processor. If a later instruction adds the value in that register to another value in another register, this later instruction is dependent on the delayed instruction: it may not execute until after the load instruction completes. On the other hand, an add instruction that adds two registers that are totally unrelated to the load instruction may be executed while the value is loaded from memory, even though the exact instruction order suggests that this add instruction should not execute yet.
Second, the processor buffers any dependent instructions for later execution. If the processor detects that a later instruction is dependent on a delayed load instruction, the later instruction may not be executed out-of-order, and is buffered until after the load instruction completes.
Third, the processor renames registers. A register may be renamed when a later instruction that is not dependent on the delayed load instruction uses a register that is used by the delayed load instruction. In this case, the processor needs to be able to rename the register used by the later instruction so that the “parallel execution” of the load instruction and the later instruction does not create a conflict.
FIG. 1
shows how a processor in the prior art operates. Processor
105
receives instruction sequence
110
. While a load instruction is pending, processor
105
examines later instructions to see if they are dependent on the delayed load instruction. If the later instruction is dependent on the delayed load instruction, the later instruction is buffered in buffer
115
. Otherwise, the later instruction may be executed out-of-order, and joins executed instructions
120
.
Two concerns may arise that limit the effectiveness of out-of-order execution. First, processor
105
may fill buffer
115
with dependent instructions. Once the buffer is full, processor
105
may not add any more instructions to buffer
115
, and all later instructions have to wait until the delayed load instruction completes. Second, the program may include a branch instruction after the load instruction. Even with branch prediction, processor
105
may not execute the instructions without some way to reverse the process in case the branch prediction was incorrect. Typically, processor
105
will simply buffer the instructions rather than execute and risk having to rewind the program execution.
The problems with out-of-order execution are exacerbated by the possibility of multiple load instructions within a relatively small block of instructions. With multiple independent load instructions, if the load instructions are executed in their original order, the processor may be more inefficient than it needs to be.
Other problems related to load instruction delays have to do with caching. Cache lines containing data requested by load instructions may be removed from the cache even though other nearby data will be requested shortly. And cache lines containing data that may be loaded shortly may not be fetched into the cache in advance of their need.
The present invention addresses these and other problems associated with the prior art.


REFERENCES:
patent: 4888679 (1989-12-01), Fossum et al.
patent: 5109498 (1992-04-01), Kamiya et al.
patent: 5146578 (1992-09-01), Zangenehpour
patent: 5214766 (1993-05-01), Liu
patent: 5261053 (1993-11-01), Valencia
patent: 5317718 (1994-05-01), Jouppi
patent: 5377336 (1994-12-01), Eickemeyer et al.
patent: 5497499 (1996-03-01), Garg et al.
patent: 5664147 (1997-09-01), Mayfield
patent: 5737624 (1998-04-01), Garg et al.
patent: 5758119 (1998-05-01), Mayfield et al.
patent: 5761706 (1998-06-01), Kessler et al.
patent: 5822764 (1998-10-01), Hardage, Jr. et al.
patent: 5838945 (1998-11-01), Emberson
patent: 5974526 (1999-10-01), Garg et al.
patent: 6000007 (1999-12-01), Leung et al.
patent: 6073215 (2000-06-01), Snyder
patent: 6085289 (2000-07-01), Thatcher et al.
patent: 6131145 (2000-10-01), Matsubara et al.
patent: 6134643 (2000-10-01), Kedem et al.
patent: 6138212 (2000-10-01), Chiacchia et al.
patent: 6167509 (2000-12-01), Sites et al.
patent: 6195735 (2001-02-01), Krueger et al.
patent: 6223256 (2001-04-01), Gaither
patent: 6263404 (2001-07-01), Borkenhagen et al.
patent: 6289433 (2001-09-01), Garg et al.
patent: 6360297 (2002-03-01), Arimilli et al.
patent: 6381679 (2002-04-01), Matsubara et al.
patent: 2001/0027515 (2001-10-01), Ukai et al.
patent: 2002/0087794 (2002-07-01), Jouppi et al.
patent: 2002/0156962 (2002-10-01), Chopra et al.
IBM Technical Disclosure Bulletin, Prefetching With Invalid Cache Entries, Aug. 1, 1990, VOL: 33, Issue: 3B, Page: 46.*
Kumar, Sanjeev and Wilkerson, Christopher; “Exploiting Spatial Locality in Data Caches using Spatial Footprints”;Proceedings of 25thAnnual ACM/IEEE International Symposium on Computer Architecture; (ISCA '98); pp. 1-12.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Critical loads guided data prefetching does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Critical loads guided data prefetching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Critical loads guided data prefetching will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3207226

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.