System and method for unrolling loops in a trace cache

Electrical computers and digital processing systems: processing – Processing control – Branching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S118000, C712S215000

Reexamination Certificate

active

06578138

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to computer processors. In particular, the present invention relates to the storage of loops within a trace cache of a processor.
BACKGROUND INFORMATION
In a computer system, a cache stores data in order to decrease data retrieval times for a processor. More particularly, a cache stores specific subsets of data in high-speed memory. When a processor requests a piece of data, the system checks the cache first to see if the data is stored within the cache. If it is available, the processor can retrieve the data much faster than if the data was stored in other computer readable media such as random access memory, a hard drive, CD ROM or floppy disk.
One particular type of cache is referred to as a trace cache. A trace cache is responsible for building, caching, and delivering instruction traces to a processor. In one type of trace cache, instructions are stored as “traces” of decoded micro-operations or “micro-ops”, and are only allowed to be accessed in units of “traces.” Traces are blocks of micro-op instructions that are distinguishable from one another only by their trace heads.
Traces often contain backward taken branches. A backward taken branch generally occurs when the target address of a branch is a prior micro-op, and in particular, for purposes of this description, a prior micro-op of the trace. In this case, the target address, the backward taken branch, and any micro-ops between the two form a loop. For example, a trace may be built containing three micro-ops, which together form a loop. The first micro-op (the trace head) may be the head of the loop, while the second micro-op is the second micro-op of the loop. In this example, the third micro-op of the trace contains a backward taken branch whose target address is the first micro-op of the trace (i.e. the trace head, in this case also the head of the loop). In a conventional trace cache, the building of the trace may stop at this point, so that the three-micro-op loop comprises the entire trace.
When executed, the loop may be repeated many consecutive times. Accordingly, the exemplary trace will be sent to the processor many consecutive times. Generally traces are constructed so that, wherever possible, each trace line contains the maximum number of micro-ops allowed by the cache structure, for example six micro-ops. In this manner, the trace cache can supply to the processor up to six micro-ops per clock cycle. The trace cache in this case, however, will supply at most three micro-ops per clock cycle, even though the trace cache may store up to six micro-ops per trace line. This shortfall occurs because the trace itself contains only three micro-ops. Moreover, the backward taken branch occurs in this example at the first line of the trace, which may decrease bandwidth even further. Many branch prediction or targeting schemes require a minimum of two clock cycles to determine the target of a branch, so that in this case a clock cycle will be wasted while the trace cache determines the target of the third micro-op.
SUMMARY OF THE INVENTION
An exemplary processor or trace cache according to the present invention includes a cache unit, which includes a data array that stores traces. The processor or trace cache also includes a control block connected to the cache unit, the control block unrolling loops when building the traces.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1
is a flow chart of a first exemplary embodiment of a method according to the present invention.
FIG. 2
is a schematic drawing of the application of the method of
FIG. 1
to two exemplary traces.
FIG. 3
is a flow chart of a second exemplary embodiment of a method according to the present invention.
FIG. 4
is a schematic drawing of the application of the method of
FIG. 3
to the two exemplary traces.
FIG. 5
is a flow chart of a third exemplary embodiment of a method according to the present invention.
FIG. 6
is a schematic drawing of the application of the method of
FIG. 5
to the two exemplary traces.
FIG. 7
is a flow chart of a exemplary embodiment of a method according to the present invention.
FIG. 8
is a schematic drawing of the application of the method of
FIG. 7
to an exemplary trace under two different conditions.
FIG. 9
is a flow chart of a further exemplary embodiment of a method according to the present invention.
FIG. 10
is a schematic illustration of an exemplary embodiment of a trace cache according to the present invention.
FIG. 11
is a schematic illustration of an exemplary computer system according to the present invention.


REFERENCES:
patent: 5381533 (1995-01-01), Peleg et al.
patent: 6018786 (2000-01-01), Krick et al.
patent: 6076144 (2000-06-01), Peled et al.
patent: 6170038 (2001-01-01), Krick et al.
Vajapeyam S et al.:Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences, Denver, Jun. 2-4 1997, New York, ACM, US, vol. Conf. 24, Jun. 2, 1997, pp. 1-12

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for unrolling loops in a trace cache does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for unrolling loops in a trace cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for unrolling loops in a trace cache will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3117239

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.