Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1999-12-30
2003-06-10
Kim, Kenneth S. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C711S118000, C712S215000
Reexamination Certificate
active
06578138
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to computer processors. In particular, the present invention relates to the storage of loops within a trace cache of a processor.
BACKGROUND INFORMATION
In a computer system, a cache stores data in order to decrease data retrieval times for a processor. More particularly, a cache stores specific subsets of data in high-speed memory. When a processor requests a piece of data, the system checks the cache first to see if the data is stored within the cache. If it is available, the processor can retrieve the data much faster than if the data was stored in other computer readable media such as random access memory, a hard drive, CD ROM or floppy disk.
One particular type of cache is referred to as a trace cache. A trace cache is responsible for building, caching, and delivering instruction traces to a processor. In one type of trace cache, instructions are stored as “traces” of decoded micro-operations or “micro-ops”, and are only allowed to be accessed in units of “traces.” Traces are blocks of micro-op instructions that are distinguishable from one another only by their trace heads.
Traces often contain backward taken branches. A backward taken branch generally occurs when the target address of a branch is a prior micro-op, and in particular, for purposes of this description, a prior micro-op of the trace. In this case, the target address, the backward taken branch, and any micro-ops between the two form a loop. For example, a trace may be built containing three micro-ops, which together form a loop. The first micro-op (the trace head) may be the head of the loop, while the second micro-op is the second micro-op of the loop. In this example, the third micro-op of the trace contains a backward taken branch whose target address is the first micro-op of the trace (i.e. the trace head, in this case also the head of the loop). In a conventional trace cache, the building of the trace may stop at this point, so that the three-micro-op loop comprises the entire trace.
When executed, the loop may be repeated many consecutive times. Accordingly, the exemplary trace will be sent to the processor many consecutive times. Generally traces are constructed so that, wherever possible, each trace line contains the maximum number of micro-ops allowed by the cache structure, for example six micro-ops. In this manner, the trace cache can supply to the processor up to six micro-ops per clock cycle. The trace cache in this case, however, will supply at most three micro-ops per clock cycle, even though the trace cache may store up to six micro-ops per trace line. This shortfall occurs because the trace itself contains only three micro-ops. Moreover, the backward taken branch occurs in this example at the first line of the trace, which may decrease bandwidth even further. Many branch prediction or targeting schemes require a minimum of two clock cycles to determine the target of a branch, so that in this case a clock cycle will be wasted while the trace cache determines the target of the third micro-op.
SUMMARY OF THE INVENTION
An exemplary processor or trace cache according to the present invention includes a cache unit, which includes a data array that stores traces. The processor or trace cache also includes a control block connected to the cache unit, the control block unrolling loops when building the traces.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1
is a flow chart of a first exemplary embodiment of a method according to the present invention.
FIG. 2
is a schematic drawing of the application of the method of
FIG. 1
to two exemplary traces.
FIG. 3
is a flow chart of a second exemplary embodiment of a method according to the present invention.
FIG. 4
is a schematic drawing of the application of the method of
FIG. 3
to the two exemplary traces.
FIG. 5
is a flow chart of a third exemplary embodiment of a method according to the present invention.
FIG. 6
is a schematic drawing of the application of the method of
FIG. 5
to the two exemplary traces.
FIG. 7
is a flow chart of a exemplary embodiment of a method according to the present invention.
FIG. 8
is a schematic drawing of the application of the method of
FIG. 7
to an exemplary trace under two different conditions.
FIG. 9
is a flow chart of a further exemplary embodiment of a method according to the present invention.
FIG. 10
is a schematic illustration of an exemplary embodiment of a trace cache according to the present invention.
FIG. 11
is a schematic illustration of an exemplary computer system according to the present invention.
REFERENCES:
patent: 5381533 (1995-01-01), Peleg et al.
patent: 6018786 (2000-01-01), Krick et al.
patent: 6076144 (2000-06-01), Peled et al.
patent: 6170038 (2001-01-01), Krick et al.
Vajapeyam S et al.:Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences, Denver, Jun. 2-4 1997, New York, ACM, US, vol. Conf. 24, Jun. 2, 1997, pp. 1-12
Krick Robert Franklin
Kyker Alan Beecher
LandOfFree
System and method for unrolling loops in a trace cache does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for unrolling loops in a trace cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for unrolling loops in a trace cache will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3117239