Speculative pre-flush of data in an out-of-order execution...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S146000, C711S141000, C711S143000

Reexamination Certificate

active

06408363

ABSTRACT:

TECHNICAL FIELD
The present invention generally relates to computer processor operations and architectures. More particularly the present invention relates to performance optimization by speculatively pre-fetching and pre-flushing data in a processor system in which instructions may be executed out of order.
BACKGROUND ART
A high performance processor, e.g., a super-scalar processor in which two or more scalar operations are performed in parallel, may be designed to execute instructions out of order, i.e., in an order that is different from what is defined by the program running on the processor. That is, in this high performance processor system, instructions are executed when they can be executed rather than when they appear in the sequence defined by the program. Typically, after the out of order execution of instructions, the results are ultimately reordered to correspond with the proper instruction order, prior to passing the results back to the program running on the processor.
Examples of processor architectures that execute instruction out of order are described in U.S. Pat. No. 5,758,178 (issued May 26, 1998, and entitled “Miss Tracking System and Method”), U.S. Pat. No. 5,761,713 (issued Jun. 2, 1998, and entitled “Address Aggregation System and Method for Increasing Throughput to a Multi-Banked Data Cache From a Processor by Concurrently Forwarding an Address to Each Bank”), U.S. Pat. No. 5,838,942 (issued Nov. 17, 1998, and entitled “Panic Trap System and Method”), U.S. Pat. No. 5,809,275 (issued Sep. 15, 1998, and entitled “Store-to Load Hazard Resolution System and Method for a Processor that Executes Instructions Out of Order”), U.S. Pat. No. 5,799,167 (issued Aug. 25, 1998, and entitled “Instruction Nullification System and Method for a Processor that Executes Instructions Out of Order”), all to Gregg Lesartre who is one of the present inventors, assigned to the present assignee, and all of which are expressly incorporated herein by reference in their entireties.
As described in more detail in, e.g., U.S. Pat. No. 5,758,178 ('178), an out of order execution processor system may include one or more processors, each having a memory queue (MQUEUE) for receiving and executing instructions that are directed to memory accesses to the cache memory (DCACHE) or the memory hierarchy. The MQUEUE includes a plurality of instruction processing mechanisms for receiving and executing respective memory instructions out of order. Each instruction processing mechanism includes an instruction register for storing an instruction and an address reorder buffer slot (ARBSLOT) for storing the data address of the instruction execution results. Significantly, dependent-on-miss (DM) indicator logic in each ARBSLOT prevents a request from its respective ARBSLOT to the memory hierarchy for miss data that is absent from the DCACHE when another ARBSLOT has already requested from the memory hierarchy the miss data.
In particular, for example,
FIG. 1
shows a block diagram of the relevant portions of the computer system for illustrating the operation of the instruction processing mechanism
39
b
portion of the MQUEUE. The MQUEUE includes one or more ARBSLOTs
48
(only one of which is shown). When an ARBSLOT
48
requests a cache line from the DCACHE
24
, the ARBSLOT
48
asserts signal ACCESS_REQ
115
accompanied with an address ACCESS_ADDR
114
. In the event that there is a potential hit in the DCACHE
24
, the status indicator
82
(or status indicators if the cache is associative) will reflect a valid cache line or lines. Further, the tag compare mechanism
108
reads the tag DCACHE_TAG(s)
81
and compares it to the tag ACCESS_TAG
116
associated with the access address ACCESS_ADDR
114
. When there is a match, the tag compare mechanism
108
concludes that there is a hit and deasserts the signal~HIT
118
to indicate a hit, which causes the ARBSLOT
48
to mark itself done. The result of the operation is held in a rename register (not shown) until the instruction retires, when it is moved to an architectural register (not shown).
When the cache access results in a cache miss, e.g., based upon a status indicator
82
indicating an invalid cache line(s), or alternatively, when the tag DCACHE_TAG(s)
81
does not match the tag ACCESS_TAG
116
, then the tag compare mechanism
108
asserts the ~HIT signal
118
to indicate a miss to the ARBSLOT
48
. Assuming that this is the first ARBSLOT
48
to attempt to access this miss data line, the DM indicator logic
135
causes the miss request signal MISS_REQUEST
111
to be issued to the miss arbitrator
107
. The miss arbitrator
107
arbitrates by prioritizing the various miss requests that can be generated by the various ARBSLOTS
48
. Eventually, the miss arbitrator
107
issues a signal MISS_GRANTED
112
to grant the miss request. This signal is sent to the ARBSLOT
48
, which in turn asserts the miss control signal MISS_CAV signal
101
to the system interface control
102
. The system interface control
102
in turn makes a memory request to the memory hierarchy (not shown) for the data line based upon the address MISS/COPY_IN ADDR
104
that is forwarded from the ARBSLOT
48
to the system interface control
102
.
Once the data line is transferred from the memory hierarchy to the system interface control
102
, the system interface control
102
passes the data line to the DCACHE
24
, as indicated by reference arrow
105
, asserts the control signal COPY_IN to the DCACHE
24
, and issues the status bits to the DCACHE
24
. Simultaneously, the system interface control
102
asserts the control signal COPY_IN
103
to the ARBSLOTs
48
and places the associated address on MISS/COPY_IN ADDR
104
to the ARBSLOTs
48
.
If another ARBSLOT
148
attempts to access the DCACHE
24
for a miss data line that is currently being requested from memory hierarchy, then the particular ARBSLOT
48
will be advised by the status indicator
82
, as the status indicator
82
will indicate a miss pending status, or that the cache line is being requested by another ARBSLOT
48
. Thus, a redundant memory request for a data line that has already been requested is avoided. A more detailed description of the memory queue (MQUEUE) and the DM indicator
135
may be found in the above listed U.S. patents, e.g., the '178 patent.
While modern day high performance processors, e.g., a super-scalar processor described above, have improved greatly in the instruction execution time, slow memory access time is still a significant impediment to a processor running at its full speed. If requests for data can be fulfilled from the cache memory, delays associated with an access to the slower memory hierarchy—usually referred to as a cache miss latency—can be avoided. Thus, reducing the number of cache misses is a goal in high performance processor designs.
Moreover, in a multi-processor systems, whenever a processor requests a data line, a coherency check is required to determine if respective caches of the other processors contain the requested data line, and/or whether a writing back (or flushing) of the data line to the memory hierarchy is required, e.g., when the data line was modified by the particular processor that owns the data line. The coherency check adds delays to memory accesses—referred to herein as coherency check latency—.
Speculative pre-fetching and pre-flushing are based on a well known locality theory, called the spatial locality theory, which observes that when information is accessed by the processor, information whose addresses are nearby the accessed information tend to be accessed as well. This is particularly true when the load or store operation that caused the cache miss is a part of an instruction code sequence, which is accessing a record length longer than a cache line, i.e., when the instruction code sequence references data that spans over multiple data lines. In a system utilizing pre-fetching and/or pre-flushing, rather than fetching (and/or flushing) only currently accessed data into (or from) the cache memory, a block of d

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speculative pre-flush of data in an out-of-order execution... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speculative pre-flush of data in an out-of-order execution..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speculative pre-flush of data in an out-of-order execution... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2919617

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.