Method for just in-time delivery of instructions in a data...

Electrical computers and digital processing systems: processing – Instruction fetching – Of multiple instructions simultaneously

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S207000

Reexamination Certificate

active

06427204

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to instruction processing systems and in particular to a method and system for ordering instruction fetch requests. Still more particularly, the present invention relates to a method and system for implementing just-in-time delivery of instructions requested by instruction fetch request.
2. Description of the Related Art
In conventional symmetric multiprocessor (SMP) data processing systems, all of the processors are generally identical. The processors all utilize common instruction sets and communication protocols, have similar hardware architectures, and are generally provided with similar memory hierarchies. For example, a conventional SMP data processing system may comprise a system memory, a plurality of processing elements that each include a processor and one or more levels of cache memory and a system bus coupling the processing elements to each other and to the system memory.
Conventional SMP data processing system processors have a number of execution units. Superscalar multiprocessors typically have more than one of each execution unit. They typically have two floating point units (FPUs), two fixed point units (FXUs) and two load/store units (LSUs). The processors are designed for high frequency and their corresponding internal caches are typically very small in order to operate with the high frequency processor. In part due to their relatively small size, these internal caches sustain a large number of cache misses during requests for instruction. Instructions are thus stored in lower level (L
2
) caches to maximize processing speed. The processors typically send multiple instruction fetch requests simultaneously or within close proximity to each other. This is particularly true in multithreaded or superscalar processors with multiple IFUs.
Traditionally, processors execute program instructions in order. With state-of-the-art processors, out-of-order execution of instructions is often employed to maximize the utilization of execution unit resources within the processor, thereby enhancing overall processor efficiency. Further, in these state-of-the-art processors that support out-of-order execution of instructions, instructions may be dispatched out of program order, executed opportunistically within the execution units of the processor, and completed in program order. The performance enhancement resulting from out-of-order execution is maximized when implemented within a superscalar processor having multiple execution units capable of executing multiple instructions concurrently.
Instructions are typically stored according to program order in a cache line within an instruction cache (I-cache) of a processor. Furthermore, each unit of access to the I-cache is generally more than one instruction. For example, for a processor architecture that has a four-byte instruction length, each I-cache access may be 32 bytes wide, which equals to a total of eight instructions per I-cache access. Even with the simplest I-cache design, these instructions must be multiplexed into an instruction buffer having eight or less slots, before sending to the issue queue.
During fetching of instructions all eight instructions are initially read from the I-cache. The fetch address of the first instruction is then utilized to control an 8-to-1 multiplexor to gate the first four instructions into an instruction buffer with, for example, four slots. The fetch address is also utilized to select a target instruction along with the next three instructions from the eight instructions, to gate into the instruction buffer. All four instructions are gated into the instruction buffer in execution order instead of program order. With this arrangement, when the fetch address is the result of a (predicted or actual) branch instruction, the first instruction to be gated into the instruction buffer may be any one of the eight instructions. The target address of the branch instruction may point to the last instruction of the I-cache access and then not all four slots within the instruction buffer will be completely filled.
Branch processing for example, results in a delay in processing particularly when the branch is speculative and is guessed incorrectly. The branch instruction and subsequent instructions from instruction path taken utilizes the cache resources which have to be re-charged when the path is incorrectly predicted. This results in a loss of many clock cycles and leads to less efficient overall processing.
Processors today often run numerous cycles ahead of the instruction stream of the program being executed. Also, on these processors, instruction fetch requests are issued as early as possible in order to “hide” the cache access latencies and thus allow ensuing dependent instructions to execute with minimal delay. These techniques lead to requests for instructions which may not be required immediately. Also, this often leads to bubbles in the pipeline of instructions. Finally, an L
2
cache has a limited amount of wired connections for returning instructions. When an instruction is sent prior to the time it is required, it utilizes valuable wired cache line resources which may be required for more immediate or important instructions.
In the prior art instruction fetch requests may be issued out of order. Often times this results in an instruction queue occupying valuable cache line resources or register space for many cycles before it is utilized by the program. When a large number of instruction fetch requests are present, this results in loading down the critical cache and queue resources resulting in less efficient processing.
When the instruction cache is “bombarded” with instruction fetch requests, no ordering information is included. The instruction cache is oblivious as to which load instruction to process and in which order. In traditional processors, ordering information is typically implied based on a “First Come First Serve” prioritization scheme. However, in some cases an instruction is often not required by the processor or program at the time or in the order it is requested.
Thus many hardware and software limitations exist in the current method of fetching instructions from an instruction cache. It is obvious that a more efficient means of fetching instructions from an instruction cache needs to be developed. A processor should be able to issue its fetch requests so that the instruction cache can more optimally deliver the instruction only when it is actually required, while preventing bubbles in the pipeline.
It would therefore be desirable to provide a method and system for improving the efficiency of instruction fetch request processing and subsequent fetching of instructions. It is further desirable to provide a method and system which allows for just-in-time delivery and/or time-ordered delivery of instructions during execution of an instruction set thus allowing instructions to be fetched from an instruction cache at the time when needed within the program execution stream.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide an improved instruction processing system.
It is another object of the present invention to provide an improved method and system for efficiently managing multiple instruction fetch requests to an instruction cache,
It is yet another object of the present invention to provide a method and system for implementing just-in-time delivery of instruction requested by instruction fetches.
The foregoing objects are achieved as is now described. A system for time-ordered issuance of instruction fetch requests (IFR) is disclosed. More specifically, the system enables just-in-time delivery of instructions requested by an IFR. The system consists of a processor, an L
1
instruction cache with corresponding L
1
cache controller, and an instruction processor. The instruction processor manipulates an architected time dependency field of an IFR to create a Time of Dependency (ToD) field. The ToD field holds a time dependency value which is utili

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for just in-time delivery of instructions in a data... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for just in-time delivery of instructions in a data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for just in-time delivery of instructions in a data... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2841590

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.