Pending access queue for providing data to a target register...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S108000, C711S119000, C711S137000, C711S144000, C711S169000, C711S146000, C712S023000, C708S233000

Reexamination Certificate

active

06185660

ABSTRACT:

FIELD OF INVENTION
This invention relates generally to computer systems, and more specifically to random access memory systems for computers, and more specifically to reducing the time required to provide requested items to a processor after a cache miss.
BACKGROUND OF THE INVENTION
Most computer systems employ a multilevel hierarchy of memory systems, with relatively fast, expensive, limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost, higher-capacity memory at the lowest level of the hierarchy. The goal of a memory hierarchy is to reduce the average memory access time. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit or mounted physically close to the processor for speed. A memory hierarchy is cost effective only if a high percentage of items requested from memory are present in the highest levels of the hierarchy (the levels with the shortest latency) when requested. If a processor requests an item and the item is present in the cache, the event is called a cache hit. If a processor requests an item and the item is not present in the cache, the event is called a cache miss. In the event of a cache miss, the requested item is retrieved from a lower level (a level with longer latency) of the memory hierarchy. This may have a significant impact on performance. In general, processor speed is increasing faster than memory speed, so that the relative performance penalty for a cache miss is increasing over time.
The average memory access time may be reduced by improving the cache hit/miss ratio, reducing the time penalty for a miss, and reducing the time required for a hit. The present patent document is primarily concerned with reducing the time penalty for a cache miss.
In many computer systems, multiple instructions overlap in execution in a technique called pipelining. In pipelining, instruction execution is broken down into small parts, called stages or phases, each of which takes a fraction of the overall time required to complete an entire instruction. After a first instruction has completed a first stage or phase and has entered the second stage or phase, a second instruction starts the first stage. At any given time, there may be many instructions overlapped in execution, each in a different phase of completion. The effective instruction rate then becomes the rate at which instructions exit the pipeline. Alternatively, computer systems may issue multiple instructions simultaneously. These systems are called superscalar machines. A variation is very long instruction word machines in which a single instruction includes multiple operations. Finally, there are systems with multiple processors that may share memory. Of course, there are combinations of all these and in particular there are superscalar pipelined machines. Simultaneous execution and overlapping execution assume independent instructions or operations. In contrast, if one operation requires a computational result from another operation, the two operations must be executed sequentially. Typically, the burden is placed on the compiler for presenting independent operations to the hardware. In an environment of simultaneous and overlapping instruction execution, a cache miss can create a substantial problem, possibly stalling many instructions.
The minimum amount of memory that can be transferred into or out of a cache is called a line, or sometimes a block. Typically, memory is organized into words (for example, 32 bits per word) and a line is typically multiple words (for example, 16 words per line). Memory may also be divided into pages, with many lines per page.
Various strategies may be employed to minimize the effects of cache misses. For example, buffers are sometimes placed between a cache and other lower level memory. These buffers typically fetch a block or line of sequential addresses including the miss address, assuming that addresses immediately following the miss address will also be needed. In U.S. Pat. No. 5,317,718 (Jouppi), a buffer called a stream buffer is placed between a cache and lower level memory. In Jouppi, items are stored in the buffer until another cache miss (if ever), and items then go from the buffer into the cache, not directly to the processor. The stream buffer described in Jouppi reduces the impact of a cache miss by rapidly loading a block of items that are likely to be needed by the processor in addition to the specific request resulting in a cache miss. Effectively, the stream buffer increases the block size. For interleaved processes, Jouppi proposes multiple stream buffers, each with a different starting address, replaced on a least-recently-used basis. In U.S. Pat. No. 5,423,016 (Tsuchiya et al), a buffer is provided that holds a single block of data. In Tsuchiya, items in the single block in the buffer are available to the processor directly from the buffer, without having to be placed into the cache. If the block of data is accessed again before being transferred to the cache, the access request is serviced directly from the block buffer. For one block, the buffer described in Tsuchiya et al enhances performance relative to Jouppi by making items in the buffer directly available to the processor without having to first place them in the cache.
There is a need for further cache miss penalty reduction, particularly for multiple misses with out of order execution and multiple misses to the same line.
SUMMARY OF THE INVENTION
A content addressable memory, called a Pending Access Queue (PAQ), is used to hold multiple processor initiated accesses, with a register destination (register load instructions), that miss a cache. The PAQ captures data arriving from lower levels as soon as the data is available.
Each PAQ entry holds enough data for the largest data item that needs to be supplied for a register destination (a double word). In addition, each PAQ entry holds all the relevant information required to support returns to the requesting unit and to support request ordering constraints. In particular, in addition to address, each PAQ entry holds the identification of the destination register and the data field of interest within a line. If there is more than one access to a single line, each separate access is given a separate entry in the queue, and all such entries for a single line are simultaneously provided with the appropriate part of the line as soon as the line is available. Finally, the pending access queue provides arbitration of the return path for multiple entries having data ready to be processed.
Providing entries for multiple misses to a single memory line is particularly useful for instructions accessing sequential areas of memory, such as procedure entry and exit boundaries. At these events, several registers are typically sequentially saved or restored and the memory addresses are sequential.
The PAQ provides the following benefits:
(a) Data is available directly from the queue without having to first place the data into the cache.
(b) Each separate request to a single line is provided a separate entry, and each entry is provided with its appropriate part of the line as soon as the line is available.
(c) The queue provides results to the requesting unit in any order needed, supporting out-of-order cache returns, and provides for arbitration when multiple sources have data ready to be processed.


REFERENCES:
patent: 3938097 (1976-02-01), Niguette, III
patent: 4851993 (1989-07-01), Chen et al.
patent: 4942518 (1990-07-01), Weatherford et al.
patent: 5233702 (1993-08-01), Emma et al.
patent: 5317718 (1994-05-01), Jouppi
patent: 5404484 (1995-04-01), Schlansker et al.
patent: 5423016 (1995-06-01), Tsuchiya et al.
patent: 5454093 (1995-09-01), Abdulhafiz et al.
patent: 5471598 (1995-11-01), Quattromani et al.
patent: 5590310 (1996-12-01), Willenz et al.
patent: 5826052 (1998-10-01), Stiles et al.
patent: 5900011 (1999-05-01), Saulsbury et al.
“Pentium Processors and Related Products”, Intel Corp, pp 2-82 thru 2-85, 1995.
M

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Pending access queue for providing data to a target register... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Pending access queue for providing data to a target register..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pending access queue for providing data to a target register... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2586759

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.