Electrical computers and digital processing systems: processing – Instruction fetching
Reexamination Certificate
1998-06-19
2001-03-06
Kim, Kenneth S. (Department: 2183)
Electrical computers and digital processing systems: processing
Instruction fetching
C711S122000, C711S213000, C712S207000
Reexamination Certificate
active
06199154
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention is related to the field of processors and, more particularly, to instruction fetch mechanisms within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors which employ wider issue rates. A “wide issue” superscalar processor is capable of dispatching (or issuing) a larger maximum number of instructions per clock cycle than a “narrow issue” superscalar processor is capable of dispatching. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
In order to support wide issue rates, it is desirable for the superscalar processor to be capable of fetching a large number of instructions per clock cycle (on the average). For brevity, a processor capable of fetching a large number of instructions per clock cycle (on the average) will be referred to herein as having a “high fetch bandwidth”. If the superscalar processor is unable to achieve a high fetch bandwidth, then the processor may be unable to take advantage of the wide issue hardware due to a lack of instructions being available for issue.
Several factors may impact the ability of a particular processor to achieve a high fetch bandwidth. For example, many code sequences have a high frequency of branch instructions, which may redirect the fetching of subsequent instructions within that code sequence to a branch target address specified by the branch instruction. Accordingly, the processor may identify the branch target address upon fetching the branch instruction. Subsequently, the next instructions within the code sequence may be fetched using the branch target address. Processors attempt to minimize the impact of branch instructions on the fetch bandwidth by employing highly accurate branch prediction mechanisms and by generating the subsequent fetch address (either branch target or sequential) as rapidly as possible.
Another factor which may impact the ability of a particular processor to achieve a high fetch bandwidth is the hit rate and latency of an instruction cache employed by the processor. Processors typically include an instruction cache to reduce the latency of instruction fetches (as compared to fetching from main memory external to the processor). By providing low latency access to instructions, instruction caches may help achieve a high fetch bandwidth. Furthermore, the low latency of access to the instructions may allow branch instructions to be rapidly detected and corresponding branch target addresses to be rapidly generated for subsequent instruction fetches.
Modern processors have been attempting to achieve shorter clock cycle times in order to augment the performance gains which may be achieved with high issue rates. Unfortunately, the short clock cycle times being employed by modern processors tend to limit the size of an instruction cache which may be employed. Generally, larger instruction caches have a higher latency than smaller instruction caches. At some size, the instruction cache access time (i.e. latency from presenting a fetch address to the instruction cache and receiving the corresponding instructions therefrom) may even exceed the desired clock cycle time. On the other hand, larger instruction caches typically achieve higher hit rates than smaller instruction caches.
Both high hit rates in the instruction cache and low latency access to the instruction cache are important to achieving high fetch bandwidth. If hit rates are low, than the average latency for instruction access may increase due to the more frequent main memory accesses required to fetch the desired instructions. Because larger instruction caches are capable of storing more instructions, they are more likely to be storing the desired instructions (once the instructions have been accessed for the first time) than smaller caches (which replace the instructions stored therein with other instructions within the code sequence more frequently). On the other hand, if the latency of each cache access is increased (due to the larger size of the instruction cache), the average latency for fetching instructions increases as well. As mentioned above, low average latency is important to achieving high fetch bandwidth by allowing more instructions to be fetched per clock cycle at a desired clock cycle time and by aiding in the more rapid detection and prediction of branch instructions. Accordingly, an instruction fetch structure which can achieve both high hit rates and low latency access is desired to achieve short clock cycle times as well as high fetch bandwidth.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a processor in accordance with the present invention. The processor employs a first instruction cache, a second instruction cache, and a fetch unit employing a fetch/prefetch method among the first and second instruction caches designed to provide high fetch bandwidth. The fetch unit selects a fetch address based upon previously fetched instructions (e.g. the existence or lack thereof of branch instructions within the previously fetched instructions) from a variety of fetch address sources. Depending upon the source of the fetch address, the fetch address is presented to one of the first and second instruction caches for fetching the corresponding instructions. If the first cache is selected to receive the fetch address, the fetch unit may select a prefetch address for presentation to the second cache. The prefetch address is selected from a variety of prefetch address sources and is presented to the second instruction cache. Instructions prefetched in response to the prefetch address are provided to the first instruction cache for storage.
In one embodiment, the first instruction cache may be a low latency, relatively small cache while the second instruction cache may be a higher latency, relatively large cache. Fetch addresses from many of the fetch address sources may be likely to hit in the first instruction cache. For example, branch target addresses corresponding to branch instructions having small displacements may be likely to hit in the first instruction cache, which stores the most recently accessed cache lines. Also, return addresses corresponding to return instructions may be likely to hit in the first instruction cache since the corresponding call instruction may have been recently executed. Other fetch addresses may be less likely to hit in the first instruction cache. For example, branch target addresses corresponding to branch instructions having large displacements or branch target addresses formed using an indirect method may be less likely to hit in the first instruction cache. Accordingly, these fetch addresses may be immediately fetched from the second instruction cache, instead of first attempting to fetch from the first instruction cache. The latency of attempting an access in the first instruction cache may thereby be avoided.
By generating prefetch addresses for the second instruction cache when the fetch address is conveyed to the first instruction cache, the fetch unit attempts to increase the likelihood that subsequent fetch addresses hit in the first instruction cache. Hits in the first instruction cache may provide the lowest latency, and hence may operate to improve the fetch bandwidth. Furthermore, in one embodiment, the first instruction cache may provide multiple cache lines in response to fetch add
Advanced Micro Devices , Inc.
Conley Rose & Tayon PC
Kim Kenneth S.
Merkel Lawrence J.
LandOfFree
Selecting cache to fetch in multi-level cache system based... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Selecting cache to fetch in multi-level cache system based..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Selecting cache to fetch in multi-level cache system based... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2534285