System and method for instruction cache re-ordering

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06519683

ABSTRACT:

BACKGROUND OF THE INVENTION
I. Field of the Invention
This invention relates generally to computer technology, and more particularly, to improving processor performance in a computer system.
II. Background Information
In high-performance processors, one of the keys to improving performance is reducing execution latency, i.e., the number of clock cycles an instruction takes to execute. One way to reduce execution latency is to use execution units such as specialized execution units. Each specialized execution unit only executes a sub-set of the architectural instructions; several different specialized execution units are implemented in conjunction with each other on the microprocessor to cover execution of the entire instruction set. Since each specialized execution unit only performs a small number of functions, that execution unit can operate faster than a fully comprehensive execution unit.
The disadvantage of utilizing specialized execution units is the necessity to steer various instructions to their appropriate execution units. This steering function becomes exponentially more difficult with an increase in the degree of superscalar dispatch (i.e., dispatching multiple instructions simultaneously per clock cycle to the execution units) for the processor. Steering instructions to specialized execution units is handled by a full crossbar. This full crossbar provides a path for each instruction to travel to each execution unit. The number of paths in a full crossbar is proportional to the number of execution units multiplied by the number of instructions being steered per cycle. Depending on the degree of superscalar dispatch the processor employs, the crossbar can become quite cumbersome in terms of the number of routing wires needed and/or silicon area. In addition, for a processor running at high frequencies, it may take several cycles for the instructions to be routed through this extensive crossbar; this increase in pipeline depth lowers processor performance (i.e., if latches are used because instructions cannot reach its destination in one clock cycle, then the latches contribute to an increase in pipeline depth which results in a decrease in processor performance).
FIG. 1
shows an example of dispatching instructions to execution units in a prior art processor implementation. In this example, a cache line
158
(cache line
158
may have a dispatch buffer that extends from the cache line) of an instruction cache includes four positions, each of the four positions stores an instruction. A crossbar
152
steers instructions and provides a path between each of the four positions of cache line
158
and each of the specialized execution units
143
a-e.
Each of the positions of cache line
158
has a path to all specialized execution units
143
a-e
because any type of instruction may be stored in any of the positions of cache line
158
and thus all positions should have access to all specialized execution units
143
a-e
in order to dispatch any type of instruction to any of execution units
143
a-e.
Each of specialized execution units
143
a-e
includes a corresponding one of schedule queues
155
a-e.
Each of the schedule queues
155
a-e,
among other functions, stores instructions in one or more entries until the instructions can be executed by a particular processing unit within the execution unit (e.g., the processing unit may be an arithmetic logic unit (“ALU”), a memory unit (“MEM”), or a complex operation unit (“CMPLX”)). A write port writes the instructions to the one or more entries (the write ports correspond to the arrows entering a particular one of specialized execution units
143
a-e
). The number of write ports within the schedule queue depends on the number of instructions that may be dispatched to the execution unit in one clock cycle. In
FIG. 1
, assuming that the processor employs a degree-4 superscalar dispatch (i.e., four instructions are dispatched simultaneously in one clock cycle), each of the schedule queues
155
a-e
has four write ports. Here, four write ports are used because in any one clock cycle, up to four instructions may be dispatched to a particular one of specialized execution units
143
a-e.
If the instructions are re-ordered prior to loading them into the instruction cache, the size of the crossbar and the number of write ports within an execution unit may be significantly reduced resulting in improved processor performance. For the foregoing reasons, there is a need for re-ordering instructions prior to loading them into an instruction cache.


REFERENCES:
patent: 5375220 (1994-12-01), Ishikawa

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for instruction cache re-ordering does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for instruction cache re-ordering, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for instruction cache re-ordering will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3181407

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.