Processor having replay architecture with fast and slow...

Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Commitment control or register bypass

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S023000, C712S219000

Reexamination Certificate

active

06735688

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of processors, and more specifically to a replay architecture having fast and slow replay paths for facilitating data-speculating operations.
2. Background Information
FIG. 1
shows a block diagram of one embodiment of a processor
100
disclosed in U.S. Pat. No. 5,966,544. The processor
100
shown in
FIG. 1
includes an I/O ring
111
which operates at a first clock frequency (I/O clock), a latency-tolerant execution core
121
which operates at a second clock frequency (e.g., slow clock), a latency-intolerant execution sub-core
131
which operates at a third clock frequency (e.g., medium clock), and a latency-critical execution sub-core
141
which operates at a fourth clock frequency (e.g., fast clock). The processor
100
shown in
FIG. 1
also includes clock multiplication and/or division units
110
,
120
, and
130
which are configured to provide appropriate clocking to the various portions or sub-cores of the processor
100
, as taught in the prior application. The specific portion of the prior application's teachings which is most pertinent here is that the execution core may include two or more portions (sub-cores) which operate at different clock rates.
In operation, the I/O ring
111
communicates with the rest of the computer system (not shown) by performing various I/O operations, such as memory reads and writes, at the I/O clock frequency. For example, the processor
100
may perform an I/O operation at the I/O ring
111
at the I/O clock frequency to read in data from an external memory device. The various execution sub-cores
121
,
131
, and
141
can perform various functions or operations with respect to the input instructions and/or input data at their respective clock frequencies. For example, the latency-tolerant execution sub-core
121
may perform an execution operation on the input data to produce a first result. The latency-intolerant sub-core
131
may perform an execution operation on the first result to produce a second result. Similarly, the latency-critical execution sub-core
141
may perform another execution operation on the second result to produce a third result. The various operations performed by the various execution sub-cores may include arithmetic operations, logic operations, and other operations, etc. It should be appreciated and understood by one skilled in the art that the execution order in which the various operations are performed need not necessarily follow the hierarchical order of the various execution sub-cores. For example, the input data could go immediately and directly to the innermost sub-core and the result obtained therefrom could go from the innermost sub-core to any other sub-core or back to the I/O ring
111
for write-back. In addition, as it is disclosed and taught in the prior application, on-chip cache structures may be split across two or more portions of the processor
100
. As such, certain operations and/or functions can be performed at one clock frequency with respect to one aspect of the data stored in the on-chip cache while other operations and/or functions can be performed at a different frequency with respect to another aspect of the data stored in the on-chip cache. For example, a way predictor miss with respect to the on-chip cache may be performed in one sub-core at one clock frequency while the TLB hit/miss detection and/or page fault detection may be performed in another sub-core at a different frequency. As such, certain errors and conditions can be detected earlier in the execution process than other errors and conditions.
FIG. 2
illustrates a block diagram of one embodiment of a processor
200
disclosed in the prior application which includes a generalized replay architecture to facilitate data speculation operations. In this embodiment, the processor
200
includes a scheduler
231
coupled to a multiplexor
241
to provide instructions received from an instruction cache (I-cache)
211
to an execution core
251
for execution. The execution core
251
may perform data speculation in executing the various instructions received from the multiplexor
241
. The processor
200
as shown in
FIG. 2
includes a checker unit
281
to send a copy of the executed instruction back to the execution core
251
for re-execution (replay) if it is determined that the data speculation is erroneous. However, in this generalized replay architecture, the checker unit
281
is positioned after the execution core
251
, after the TLB and tag logic
261
, and after the cache hit/miss logic
271
. Some instructions may have been known to have been executed incorrectly (i.e., because data speculation is erroneous) earlier than this checker positioning would permit detection. Specifically, there are cases in which certain errors and conditions can be detected earlier which indicates that data speculation in these cases is erroneous even before the TLB/TAG logic
261
and the hit/miss logic
271
are executed. Unfortunately, because of the current positioning of the checker unit
281
, the respective instructions that were executed incorrectly due to erroneous data speculation would not be sent back to the execution core
251
for re-execution or replay until they reach the checker unit
281
. Thus, there is an unnecessary delay between the time when an instruction is known to have been executed incorrectly due to erroneous data speculation until the time when the respective instruction is actually sent back for re-execution. Thus, the system performance is not being optimized as much as it could have been had those instructions which were executed incorrectly been re-executed or replayed earlier in the process.
SUMMARY OF THE INVENTION
According to one aspect of the invention, a microprocessor is provided that includes an execution core, a first replay mechanism and a second replay mechanism. The execution core performs data speculation in executing a first instruction. The first replay mechanism is used to replay the first instruction via a first replay path if an error of a first type is detected which indicates that the data speculation is erroneous. The second replay mechanism is used to replay the first instruction via a second replay path if an error of a second type is detected which indicates that the data speculation is erroneous.


REFERENCES:
patent: 3618042 (1971-11-01), Miki et al.
patent: 5828868 (1998-10-01), Sager et al.
patent: 5966544 (1999-10-01), Sager
patent: 6094717 (2000-07-01), Sager et al.
patent: 6098166 (2000-08-01), Leibholz et al.
patent: 6212626 (2001-04-01), Merchant et al.
patent: WO 98/21684 (1998-05-01), None

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Processor having replay architecture with fast and slow... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Processor having replay architecture with fast and slow..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Processor having replay architecture with fast and slow... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3199631

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.