Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Commitment control or register bypass
Reexamination Certificate
1999-11-04
2003-11-25
Ellis, Richard L. (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Commitment control or register bypass
C712S214000
Reexamination Certificate
active
06654876
ABSTRACT:
BACKGROUND
1. Field of the Present Invention
The present invention generally relates to the field of microprocessors and more particularly to a microprocessor architecture supporting a variable cycle instruction reject delay to improve processor performance.
2. History of Related Art
The speed of high performance superscalar microprocessors (processors), measured in terms of the frequency of the processor's clock signal, is rapidly migrating from the MHz range to the GHz range. As cycle times decrease with ever increasing clock rates, the number of levels of logic allowable in the design of any pipeline stage is extremely limited. These limited number of logic levels must be optimized to accomplish the most common tasks within the time limits imposed by the operating frequency. As an example, the pipeline of a processor's load/store unit (LSU) must be capable of successfully completing a load instruction in each cycle as long as the load instructions hit in the processor's L
1
cache. Inevitably, however, less frequently occurring conditions cannot be resolved within the timing constraints imposed by the system. In a conventional processor, the determination of whether to reject an instruction is made when the instruction is in a final stage (the finish stage) of the pipeline. If, for any number of reasons, the functional unit in which the instruction is executing lacks sufficient information to determine that the instruction should be completed when the instruction reaches the finish stage, the instruction must be rejected. Thus, it will be appreciated that conventionally designed processors typically employ a fixed timing reject mechanism in which the reject decision is made a predetermined and non-varying number of cycles after the instruction issues.
Turning to
FIG. 3
, a timing diagram illustrating the operation of a fixed timing reject mechanism of a conventional processor is presented. In cycle
1
of the timing diagram, an instruction indicated by reference numeral
301
is issued and begins to flow through the pipeline. If the instruction contains a reference to a location in memory, the processor must initiate the process of determining whether valid data for the referenced memory address is available in the processor's L
1
data cache. This process may include an address translation component, in which the address recited in the instruction (the effective address) is translated to an address corresponding to a physical memory location (the real address) and an L
1
cache retrieval component, in which the address tags of the L
1
cache are compared against the address of the memory reference and data returned form the L
1
cache. In the depicted example, a miss signal
303
is asserted to indicate that the data retrieval process failed to complete successfully. The miss signal
303
may reflect a variety of conditions that caused the instruction not to complete successfully. In one case, as an example, miss signal
303
may indicate that the effective to real address translation (ERAT) process could not complete in the time it takes instruction
301
to propagate through the pipeline. When this occurs, the processor must initiate a relatively time consuming retrieval of address translation information. Because the address translation information is not available when instruction
301
arrives at the finish stage in cycle
6
, a reject signal indicated in
FIG. 3
by reference numeral
307
is asserted. In response to reject signal
307
, the processor reissues instruction
301
in the next cycle (cycle
7
) and the instruction begins to propagate through the pipeline again. If the number of cycles required to retrieve the address translation information initiated by miss signal
303
is greater than the depth of the pipeline (in stages), the address translation information will not be available when instruction
301
reaches the finish stage for a second time in cycle
12
. Accordingly, the instruction is rejected in cycle
12
and reissued for a third time in cycle
13
. When instruction
301
reaches the finish stage in cycle
18
, the necessary translation information has had sufficient time to be retrieved and the instruction can complete successfully. Because a reject decision had to be made as soon as the instruction reached the finish stage of the pipeline, instruction
301
was rejected twice and was required to travel the LSU pipeline three times. More generally it can be said that the fixed timing reject mechanism of conventional processors forces an all-or-nothing decision when an instruction reaches the finish stage of a pipeline. If any information or resource necessary to complete the instruction is unavailable in the cycle that the instruction reaches the finish stage, the instruction is rejected. Moreover, whenever an instruction is rejected, completion of that instruction will be delayed by at least the number of stages in the pipeline. If a pipeline includes six stages, an instruction that is rejected in cycle X cannot complete until, at the earliest, cycle X+6. If the instruction is rejected again in cycle X+6, the next earliest cycle in which the instruction could complete would be cycle X+12 and so forth. In other words, one can think of the processor as having an “instruction period” or “instruction cycle” that is equal to the number of pipeline stages in the processor. In a conventional, fixed timing reject processor, the reject decision is made at the end of each instruction period. It will be appreciated, however, that in some cases, the information or resource that is lacking at the time an instruction reaches its decision point (i.e., the finish stage) may be available before the end of the next instruction period. In this case, performance is negatively impacted because the architecture inhibits completion of the result until the end of the next instruction period. As an example, consider a processor with a six cycle instruction period in which the retrieval of address translation information (when the information is not immediately available in an address translation cache) requires ten cycles and the retrieval process is not initiated until the fifth cycle of the instruction period, when the processor determines that the address translation information is not locally available (i.e., is not cached). If the retrieval of the address translation process is initiated in cycle
5
, it will not be available until cycle
15
, which falls in the middle of an instruction cycle. In this case, completion of the instruction is again delayed for the number of cycles between the time when all information is available to complete the instruction (cycle
15
in the example) and the end of the next instruction cycle (cycle
18
). Therefore, it would be beneficial to implement an architecture that eliminated the performance penalty resulting from the constraint of requiring a reject decision in the cycle when an instruction reaches the finish stage.
SUMMARY OF THE INVENTION
The problems identified above are in large part address by a processor implementing a delayed reject mechanism. The processor includes an issue unit suitable for issuing an instruction in a first cycle and a load store unit. The load store unit includes an extend reject calculator circuit configured to receive a set of completion information signals and to generate a delay value based thereon. The LSU is adapted to determine whether to reject the instruction in a determination cycle. The number of cycles between the first cycle and the determination cycle is a function of the delay value such that reject timing is variable with respect to the first cycle. In one embodiment, the processor is further configured to reissue the instruction after the determination cycle if the instruction was rejected in the determination cycle. The delay value is conveyed via a 2-bit bus in one embodiment. The 2-bit bus permits delaying the determination cycle from 0 to 3 cycles after the finish cycle. In one embodiment, the number of cycles between the first cycle and
Le Hung Qui
Shippy David James
Ellis Richard L.
Emile Volel
Lally Joseph P.
Meonske Tonia L.
Roberts Diana L.
LandOfFree
System for rejecting and reissuing instructions after a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System for rejecting and reissuing instructions after a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for rejecting and reissuing instructions after a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3153341