Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-01-19
2002-11-26
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S119000
Reexamination Certificate
active
06487639
ABSTRACT:
FIELD OF THE INVENTION
The invention is generally related to data processing systems and processors therefor, and in particular to retrieval of data from a data cache in a multi-level memory architecture.
BACKGROUND OF THE INVENTION
Computer technology continues to advance at a remarkable pace, with numerous improvements being made to the performance of both microprocessors—the “brains” of a computer—and the memory that stores the information processed by a computer.
In general, a microprocessor operates by executing a sequence of instructions that form a computer program. The instructions are typically stored in a memory system having a plurality of storage locations identified by unique memory addresses. The memory addresses collectively define a “memory address space,” representing the addressable range of memory addresses that can be accessed by a microprocessor.
Both the instructions forming a computer program and the data operated upon by those instructions are often stored in a memory system and retrieved as necessary by the microprocessor when executing the computer program. The speed of microprocessors, however, has increased relative to that of memory devices to the extent that retrieving instructions and data from a memory can often become a significant bottleneck on performance. To decrease this bottleneck, it is desirable to use the fastest available memory devices possible. However, both memory speed and memory capacity are typically directly related to cost, and as a result, many computer designs must balance memory speed and capacity with cost.
A predominant manner of obtaining such a balance is to use multiple “levels” of memories in a memory architecture to attempt to decrease costs with minimal impact on system performance. Often, a computer relies on a relatively large, slow and inexpensive mass storage system such as a hard disk drive or other external storage device, an intermediate main memory that uses dynamic random access memory devices (DRAM's) or other volatile memory storage devices, and one or more high speed, limited capacity cache memories, or caches, implemented with static random access memory devices (SRAM's) or the like. In some instances, instructions and data are stored in separate instruction and data cache memories to permit instructions and data to be accessed in parallel. One or more memory controllers are then used to swap the information from segments of memory addresses, often known as “cache lines”, between the various memory levels to attempt to maximize the frequency that requested memory addresses are stored in the fastest cache memory accessible by the microprocessor. Whenever a memory access request attempts to access a memory address that is not cached in a cache memory, a “cache miss” occurs. As a result of a cache miss, the cache line for a memory address typically must be retrieved from a relatively slow, lower level memory, often with a significant performance penalty.
Data cache misses in particular have been found to significantly limit processor performance. In some designs, for example, it has been found that over 25% of a microprocessor's time is spent waiting for data cache misses to complete. Therefore, any mechanism that can reduce the frequency and/or latency of data cache misses can have a significant impact on overall performance.
One conventional approach for reducing the impact of data cache misses is to increase the size of the data cache to in effect reduce the frequency of misses. However, increasing the size of a data cache can add significant cost. Furthermore, oftentimes the size of the data cache is limited by the amount of space available on an integrated circuit device. Particularly when the data cache is integrated onto the same integrated circuit device as a microprocessor to improve performance, the amount of space available for the data cache is significantly restricted.
Other conventional approaches include decreasing the miss rate by increasing the associativity of a cache, and/or using cache indexing to reduce conflicts. While each approach can reduce the frequency of data cache misses, however, each approach still incurs an often substantial performance hit whenever data cache misses occur.
Yet another conventional approach for reducing the impact of data cache misses incorporates value prediction to attempt to predict what data will be returned in response to a data cache miss prior to actual receipt of such data. In particular, it has been found that the result of practically any instruction can be predicted approximately 50% of the time based upon the result of the last execution of the instruction.
To implement value prediction, it has been proposed to store the result of each instruction in a lookup table after the instruction is executed. The result would be indexed by the memory address of the instruction. Subsequently, whenever the same instruction was executed again, the lookup table would be accessed to attempt to locate the result at the same time that the data cache was accessed. If the data cache access missed, the predicted result would be used, and subsequent instructions would be executed speculatively using the predicted result while the data cache miss was processed. Then, when the data in the data cache was returned, it would be compared to the predicted result to verify the prediction. If the prediction was correct, a performance benefit would be obtained since the subsequent instructions were executed sooner than would otherwise occur if the processor waited for the data from the data cache to be returned. On the other hand, if the prediction was incorrect, the processor would need to be “rolled back” to essentially undo the results of the speculatively-executed instructions. Assuming a relatively reliable prediction, however, the benefits of prediction would exceed the penalties of misprediction, resulting in an overall performance improvement.
One problem associated with proposed value prediction implementations is that a relatively large lookup table would be required to achieve a significant performance improvement. Specifically, with proposed implementations, predicted values are stored for either all static instructions or all static load instructions. However, it has been found that most commercial workloads have a relatively large instruction working sets—that is, a relatively large number of instructions are typically executed before any particular instruction is repeated. Since value prediction relies on the results from previous executions of instructions, a lookup table would need to be relatively large to ensure that predicted data was available on a relatively frequent basis.
However, given that space on a processor integrated circuit device is often at a premium, it is often desirable to minimize the space occupied by all components on the device, including any lookup tables. Consequently, the size of a value prediction lookup table is often constrained, which by necessity limits its effectiveness. Increasing the size of a lookup table often increases costs and/or requires other compromises to be made in other areas of a processor design. Therefore, a need still exists in the art for improving the effectiveness of value prediction in a more compact and cost effective manner.
SUMMARY OF THE INVENTION
The invention addresses these and other problems associated with the prior art by providing a data processing system, circuit arrangement, integrated circuit device, program product, and method that implement value prediction in a data cache miss lookaside buffer that maintains predicted values only for load instructions that miss the data cache. It has been found that a large proportion of data cache misses, e.g., as many as 80-90% or more, are caused by a relatively few number of instructions. Moreover, it has been found that the predictability of load instructions that miss a data cache is often greater than other instructions. As a result of both of these factors, limiting value prediction to load instructions that miss the data cache enables
Anderson Matthew D.
International Business Machines - Corporation
Kim Matthew
Wood Herron & Evans
LandOfFree
Data cache miss lookaside buffer and method thereof does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Data cache miss lookaside buffer and method thereof, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data cache miss lookaside buffer and method thereof will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2983095