Method, system, and apparatus to improve instruction...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S207000, C711S125000

Reexamination Certificate

active

06314431

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to a method and apparatus for improving performance of instruction pre-fetching on computer systems.
BACKGROUND OF THE INVENTION
Typically computers require fast access to portions of computer memory to enable timely execution of instructions that are stored in the memory and are subsequently executed by the computer processor. Management of the location of an instruction that executes in a computer system requires allocation of the location of an instruction in a timely manner to ensure that the instruction will be available for execution without additional access of the instruction from the memory, cache memory, or another storage medium. Cache miss latency is a performance problem in the execution of computer-based instructions. It will be appreciated that cache memory is a small, fast unit of the memory and may be located close to the processor to ensure fast access to information in the cache by the processor. The terms “cache” and “cache memory” will be used interchangeably herein.
Typically the speed of operation of the processor is faster than the speed of access to cache memory. When the processor accesses information in the cache this is referred to herein as a “cache hit.” When the processor is not able to access information in the cache this is referred to herein as a “cache miss.” Cache miss latency has increased as the disparity between the speed required for processor operations and the speed required to access the memory has increased.
Pre-fetching is the fetching of instructions into the cache before they are needed. The pre-fetching distance is the elapsed time between initiating and using the result of the pre-fetch and should be large enough to hide cache miss latency. However, the pre-fetch distance should not be so large that the pre-fetched instructions are displaced by other information placed in the cache before the pre-fetched instructions are used. Therefore, timeliness is the measure of whether an instruction is pre-fetched before it is needed but not pre-fetched so soon that it must be discarded before it can be used. Generating timely pre-fetches has been a problem with pre-fetching solutions.
A pre-fetch is useless if it brings a line into the cache which will not be used before it is displaced. A pre-fetch is accurate if it is actually used. It will be appreciated that a “line” includes at least one instruction and represents a unit of instructions that may be pre-fetched on a computer system.
A problem with pre-fetching is obtaining the appropriate coverage of a pre-fetch. It will be appreciated that coverage is the identification of useful pre-fetched instruction requests while minimizing useless pre-fetched instruction requests. Attempting to obtain optimal coverage can increase the probability of useless pre-fetches. That is, larger amounts of pre-fetched instructions may increase the probability of useless pre-fetches. The pre-fetch distance should be large enough to hide the cache miss latency while not being so large as to increase the amount of unnecessary pre-fetches and has been a problem in the past.
Pre-fetching problems are discussed with reference to
Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern Processors,
” Chi-Keung Luk and Todd C. Mowry, Proceedings of Micro-31, Nov. 30-Dec. 2, 1998, and
Prefetching using Markov Predictors,
Doug Joseph and Dirk Grunwald, 1997 Proceedings of the International Symposium on Computer Architecture, June 1997.
SUMMARY OF THE INVENTION
The present invention is a method and apparatus for improving instruction pre-fetching in computer systems.
Pre-fetching may be focused on instructions or data. The present invention enables efficient pre-fetching of instructions. The present invention novelly determines a location for insertion in a program of pre-fetched instructions earlier than in the past and in a cost effective manner. Therefore, the present invention introduces more control into the determination of when to initiate instruction pre-fetching than in the past. The present invention efficiently inserts pre-fetched code into a code sequence to enable sequential code execution with reduced cache miss latency during execution.
The present invention assumes the existence of and utilizes the computer-based hardware capabilities of: a computer-based pre-fetch instruction that pre-fetches the cache line corresponding to a particular instruction address, and an augmentation to a computer-based branch instruction that can specify whether sequential instruction pre-fetching should be initiated at the target of a branch instruction.
The present invention may perform during compile time or run-time. When the present invention operates during compile-time it advantageously uses information available before program execution thereby reducing the overhead required for pre-fetching during program execution. When the present invention operates during run-time it exploits computer system features that allow pre-fetching of instructions that are introduced into the execution process. The term “compile-time” refers to the period of compilation before a computer program is loaded and executing on the computer system, and the term “run-time” refers to the period of compilation after the computer program is loaded and is able to execute on the computer system.
The present invention operates on a computer having memory that is accessed by at least one instruction generated from a computer readable medium encoded in a program that executes on the computer. The computer includes execution cycles and executes instructions in an order during the execution cycles. Further, the instruction includes at least one value. The present invention determines a minimum threshold value that defines a cost effective pre-fetching size. The present embodiment also accesses a current branch instruction in the program that is associated with a target instruction.
The present invention executes a loop while the current branch instruction is accessed in the source program. Within the loop the present embodiment inserts the pre-fetch instruction for the target instruction in the program if pre-fetching the target instruction is cost effective. Also a target basic block associated with the target instruction is accessed so that a predicted target trace size is determined. Further, the augmented branch instruction is generated enabling sequential instruction pre-fetching during execution if the predicted target trace size is greater than the minimum threshold thereby improving pre-fetching on the computer.
The loop execution is managed by accessing a next branch instruction if the next branch instruction has not been accessed. Further the next branch instruction is associated with a target instruction. If the next branch instruction is accessed the next branch instruction is labeled as the current branch instruction, typically by a move instruction or copy instruction. Otherwise the current branch instruction is labeled as not accessed and execution of the loop is therefore completed.
In one embodiment of the present invention insertion of the pre-fetch instruction includes defining an advance_cycles value that is a cost effective number of execution cycles in advance of the current branch instruction, and advance_cycles identifies the location at which to insert said pre-fetch instruction. Then the present embodiment inserts the pre-fetch instruction advance_cycles in advance of the current branch instruction.
In another embodiment, at least one instruction slot that is associated with an instruction_slot_execution_cycle is identified. Then the alternative embodiment inserts the pre-fetch instruction at the instruction_slot_execution_cycle if the instruction_slot_execution_cycle is located advance_cycles in advance of the branch instruction. Otherwise the pre-fetch instruction is inserted at the instruction_slot_execution_cycle if the instruction_slot_execution_cycle is located in advance of advance_cycles, in advance of the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method, system, and apparatus to improve instruction... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method, system, and apparatus to improve instruction..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, system, and apparatus to improve instruction... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2588102

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.