Processor with apparatus for tracking prefetch and demand...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S137000, C711S138000, C712S207000

Reexamination Certificate

active

06212603

ABSTRACT:

FIELD OF THE INVENTION
The present invention is generally related to the field of microprocessors. More particularly, the invention relates to prefetching schemes for improving instruction processing speed in high-performance computer systems.
BACKGROUND OF THE INVENTION
As the operating frequencies of microprocessors continues to rise, performance often depends upon providing a continual stream of instructions and data in accordance with the computer program that is running. As such, many processors include branch prediction circuitry that is used to predict branch addresses and to cause the prefetching of instructions in the instruction stream before they are needed. For example, U.S. Pat. No. 5,469,551 discloses a branch prediction circuit along with a subroutine stack used to predict branch address and prefetch at the instruction stream.
As application programs get larger, instruction fetch penalty has become one of the major bottlenecks in system performance. Instruction fetch penalty refers to the number of cycles spent in fetching instruction from different levels of cache memories and main memory. Instruction prefetch is an effective way to reduce the instruction fetch penalty by prefetching instructions from long-latency cache memories or main memory to short-latency caches. The basic idea of any instruction prefetching scheme is to pre-load instructions from external memory or a higher-level cache into the local instruction cache that is most closely associated with the execution unit of the processor. Therefore, when instructions are actually demanded, the fetch penalty of the instructions is small.
When instructions are immediately available in the local cache memory of the processor, program execution precedes smoothly and rapidly. However, if an instruction is not resident in the on-chip instruction cache, the processor must request the instruction from a higher-level cache or from external memory. If the instruction is present in the higher-level cache (e.g., the L1 cache) the delay may only be around eight clock cycles of the processor. The cost of generating a bus cycle to access external memory is much greater: on the order of a hundred clock cycles or more. This means that program execution must halt or be postponed until the required instruction returns from memory. Hence, prefetching is aimed at bringing instructions into a cache local to the processor prior to the time the instruction is actually needed in the programmed sequence of instructions.
It is equally important for the instruction prefetch mechanism to acquire the correct instructions. Because a prefetch needs to be performed before the program actually reaches the prefetch target, the prefetch target is often chosen based on a prediction of the branch. When a branch is predicted correctly, the demanded instructions are prefetched into short-latency caches, thus reducing the fetch penalty. However, when a branch is predicted incorrectly, the prefetched instructions are not useful. In some cases, incorrectly prefetched instructions can actually be harmful to the program flow because they cause cache pollution. In addition, often times prefetching incorrect branch targets results in a “stall” condition in which the processor is idle while the main memory or long-latency cache memories are busy acquiring the critically-demanded instructions.
To overcome these difficulties, designers have developed a variety of different systems and methods for avoiding stalls. By way of example, U.S. Pat. No. 5,396,604 teaches an approach that obviates the need for defining a new instruction in the instruction set architecture of the processor.
Yet another approach for reducing the cache miss penalty is to maintain a scoreboard bit for each word in a cache line in order to prevent the writing over of words previously written by a store instruction. U.S. Pat. No. 5,471,602 teaches a method for improving performance by allowing stores which miss the cache to complete in advance of the miss copy-in.
While each of these systems and methods provides improvement in processor performance, there still exists a need to increase the overall speed of prefetching operations. In other words, by its nature prefetching is a speculative operation. In previous architectures, hardware implementations have been used to bring instructions into the machine prior to execution. Past approaches, however, have failed to make best use of memory bandwidth by specifying how much to prefetch, in addition to where and when to prefetch instructions. That is, many machines simply prefetch in the next sequential line of instructions following a cache miss because most instructions exhibit sequential behavior.
To minimize the instruction fetch penalty and to increase the accuracy of prefetching operations, it is therefore desirable to provide a new mechanism for instruction prefetching.
SUMMARY OF THE INVENTION
A processor is disclosed that prefetches and executes instructions in a pipelined manner. In one embodiment, the processor comprises a first (L1) cache and an instruction cache that stores instructions which have yet to be executed. An instruction pointer device is utilized to select one of a plurality of incoming addresses for fetching instructions.
The processor further includes an instruction streaming buffer (ISB) that stores instructions returned from the L1 cache before the instructions are actually written into the instruction cache. A way multiplexer is coupled to the instruction cache to output an instruction selected by the instruction pointer. Also coupled to the way multiplexer is a bypass path that provides the instruction to the way multiplexer from a plurality of bypass sources other than the instruction cache. By way of example, among the bypass sources is an output from the data array in the instruction streaming buffer. This allows an instruction that has not yet been written into the local instruction cache to be bypassed directly to the way multiplexer in response to a demand fetch.
The processor of the present invention further includes a request address buffer (RAB) that registers physical and virtual addresses associated with an instruction of a miss request by the processor to the L1 cache. Each entry of the request address buffer has an associated identification (ID) that is sent to the L1 cache as part of the miss request. This ID is returned to the request address buffer from the L1 cache to read out the physical and virtual addresses corresponding to the instruction of the miss request. These physical and virtual addresses are read out from the RAB directly into the ISB.


REFERENCES:
patent: 5870599 (1999-02-01), Hinton et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Processor with apparatus for tracking prefetch and demand... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Processor with apparatus for tracking prefetch and demand..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Processor with apparatus for tracking prefetch and demand... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2522243

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.