Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1997-06-23
2001-01-16
Cabeca, John W. (Department: 2752)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S140000, C711S204000, C711S205000, C711S207000, C711S213000, C712S207000
Reexamination Certificate
active
06175898
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates generally to the field of data cache memories, and more specifically to an apparatus and method for prefetching data into a cache memory.
The recent trend in computer systems has been toward faster and more efficient microprocessors. However, the speed with which the processors are able to access their related memory devices has not increased at the same rate as the processors' execution speed. Consequently, memory access delays have become a bottleneck to increasing overall system performance.
Generally, the faster data can be retrieved from a memory device, the more expensive the device is per unit of storage. Due to this cost, it is not feasible to have enough register (i.e., fast memory device) capacity in the microprocessor's on-chip main memory to hold all of the program instructions and data needed for many applications. Consequently, most of the data and instructions are kept on large, relatively slow storage devices. Only the instructions and data that are currently needed are brought into registers.
To reduce the time it takes to retrieve data from the slower bulk storage memories, specialized memories are placed between the registers and the bulk storage devices. These memories are known as cache memories in the industry. Cache memories exploit the “principle of locality,” which holds that all programs favor a particular segment of their address space at any instant in time. This hypothesis has two dimensions. First, locality can be viewed in time (temporal locality,) meaning that if an item is referenced, it will tend to be referenced again soon. Second, locality can be viewed as locality in space (spatial locality,) meaning that if an item is referenced, nearby items will also tend to be referenced. By bringing a block or subblock of data into the cache when it is referenced, the system can take advantage of both of these principles to reduce the time it takes to access the data the next time it is referenced.
Data may be brought into the cache as it is requested, or sometimes before it is requested. If data is brought into the cache memory before it is requested, it is said to be “prefetched” Prefetching may be initiated by software or hardware. In software prefetching, the compiler inserts specific prefetch instructions at compile time. The memory system retrieves the requested data into the cache memory when it receives a software prefetch instruction, just as it would for a normal memory request. However, nothing is done with the data beyond that point until another software instruction references the data.
Hardware prefetching dynamically decides during operation of the software which data will most likely be needed in the future, and prefetches it without software intervention. If it makes the correct decision on what data to prefetch, the data is ready when the software requests it. Decisions on when to prefetch are often made with the assistance of a history buffer. A history buffer retains information related to individual software instructions. It maintains a set of entries cataloguing what has taken place in previous iterations of the instructions.
Each method has its advantages and disadvantages. Software is often more efficient in deciding when to prefetch data. However, extra instruction cycles are required to execute the prefetch instructions. On the other hand, hardware may make more mistakes in deciding when to prefetch, but does not require the extra instruction cycles. Hardware prefetching is also often advantageous to speed up old codes/binaries that were not compiled with software prefetching.
Another architectural feature implemented in some of today's microprocessor architectures is the use of multiple caches.
FIG. 1
is a diagram showing some previously known uses of multiple caches in a memory system
105
. A processor
110
is connected to registers within a main memory
120
. Processor
110
has direct access to the registers. If an instruction or data is needed by processor
110
, it is loaded into the registers from a storage device
125
.
Multiple caches may be placed between storage device
125
and main memory
120
in a variety of ways. For example, two caches may be placed hierarchically. In modern processors, it is common to have a first level of cache, L1 cache
140
, on the same integrated circuit as the processor and main memory
120
. A second level of cache, L2 cache
150
, is commonly located between L1 cache
140
and storage device
125
. Generally, L1 cache
140
is more quickly accessible than L2 cache
150
because they reside on the same integrated circuit.
Another way that multiple cache systems are implemented is with parallel caches. This allows multiple memory operations to be done simultaneously. A second cache, L1 cache
142
, is located in parallel with L1 cache
140
at the first level. In some applications, L1 cache
142
is a specialized cache for fetching a certain type of data. For example, first L1 cache
140
may be used to fetch data, and second L1 cache
142
may be used to fetch instructions. Alternatively, second L1 cache
142
may be used for data that is referenced by certain instruction that commonly reuse the same data repeatedly throughout a calculation. This often occurs with floating point or graphics operations.
Another approach for using parallel caches is taught in commonly assigned U.S. Pat. No. 5,898,852, issued Apr. 27, 1999 entitled “Load Steering for Dual Data Cache”, which is incorporated herein by reference for all purposes. It teaches the use of first L1 cache
140
as a standard data cache and second L1 cache
142
as a prefetch cache for prefetching data as described above.
Additional hardware features may also be included in a cache system to increase the performance of the system. A translation lookaside buffer (TLB)
160
may be added to speed up the access to storage device
125
in the case of a cache miss. Generally, processor
110
references an item of data by a virtual address. A line of data in the cache may be referenced by a tag that is related to the virtual address. However, the data is stored on storage device
125
according to a physical address. If a cache miss occurs, a translation must be done by cache miss handling logic (not shown) to calculate the physical address from the virtual address. This translation may take several clock cycles and cause a performance penalty. TLB
160
is used to hold a list of virtual to physical translations, and if the translation is found in the TLB, time is saved in subsequent accesses to the same data.
A limitation of currently available devices is that an instruction directed toward a parallel cache, such as L1 cache
142
, that causes a cache miss causes significant delays. These delays occur because the instruction must be recycled to the main cache system for determining the physical address.
Consequently, it is desirable to provide an improved apparatus and method for implementing parallel caches that reduces the instances that instruction recycling must occur, and for deciding when and what type of instructions to send to the parallel cache. Further, it is desirable to provide an improved architecture and method for prefetching data into a cache memory.
SUMMARY OF THE INVENTION
A method for retrieving data requested by a processor from a memory system is disclosed. The method includes the steps of (1) selectively storing data in a prefetch cache before the data is referenced by a memory instruction, (2) translating a virtual address of data anticipated to be referenced by the memory instruction to a physical address producing an address translation, (3) storing the address translation in a translation lookaside buffer in anticipation of it being referenced by the memory instruction, and (4) selectively executing the memory instruction in a prefetch pipeline by accessing the translation lookaside buffer and the prefetch cache.
The method further comprises storing information about the memory instruction in a history file. The information may includ
Ahmed Sultan
Chamdani Joseph
Cabeca John W.
Sun Microsystems Inc.
Townsend and Townsend / and Crew LLP
Tran Denise
LandOfFree
Method for prefetching data using a micro-TLB does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for prefetching data using a micro-TLB, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for prefetching data using a micro-TLB will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2518303