Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Scoreboarding – reservation station – or aliasing
Reexamination Certificate
1998-12-23
2002-04-23
Ellis, Richard L (Department: 2651)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Scoreboarding, reservation station, or aliasing
C712S206000
Reexamination Certificate
active
06378063
ABSTRACT:
TECHNICAL FIELD OF THE INVENTION
This invention relates to the field of microprocessor architecture, more particularly to an architecture that makes efficient use of instruction execution units in a multi-cluster system.
BACKGROUND OF THE INVENTION
Early microprocessors operated at relatively low clock frequencies. As users demanded faster microprocessors, designers responded by increasing the clock frequency. In some designs, the higher clock frequency did not interfere with the correct logical operation of the microprocessor. In other designs, the higher clock frequency caused subsystems in the microprocessor to fail. These failures were addressed in several ways. Some failures were corrected by packing the logic devices more densely on the chip in order to decrease signal path lengths between the logic devices. Others were corrected by implementing the design in a faster technology, such as gallium arsenide. As clock frequencies continued to increase, these strategies became more difficult and costly to implement, and other strategies evolved to satisfy the user's demand for faster microprocessors.
One such strategy involved designing multiple instruction execution units into a single microprocessor. A microprocessor having multiple instruction execution units can execute more instructions per unit of time than a microprocessor having a single instruction execution unit. This strategy evolved to a point where multiple instruction execution units were grouped or clustered to further increase microprocessor performance. However, the performance improvement in these multi-cluster microprocessors comes at the cost of increased complexity in the scheduler, the microprocessor subsystem that routes instructions to the clusters in an attempt to improve the utilization of the instruction execution units. An additional problem arises when the results produced by a first cluster are required for use by a second cluster. In that case, a delay in waiting for the results produced by the first cluster to be available to the second cluster reduces the throughput of the microprocessor.
Referring to
FIG. 1
, a block diagram of a prior art microprocessor system is shown. Memory
100
is provided for storing instructions. Coupled to memory
100
is instruction fetch
105
. The purpose of instruction fetch
105
is to retrieve instructions from memory
100
and present them to scheduler
110
. Scheduler
110
routes instructions to either first cluster
115
or second cluster
120
. First execution unit
125
and second execution unit
130
are provided for executing instructions routed to first cluster
115
. Third execution unit
135
and fourth execution unit
140
are provided for executing instructions routed to second cluster
120
. Retirement unit
145
is coupled to the outputs of first cluster
115
and second cluster
120
and couples the architectural state via write back bus
160
to first cluster
115
and second cluster
120
. The architectural state is the bit configuration of all the registers in retirement unit
145
at a given time. First cluster fast results bypass
150
is provided to couple the output of first cluster
115
to the input of first cluster
115
, for use in first cluster
115
, prior to commitment in retirement unit
145
. Likewise, second cluster fast results bypass
155
is provided to couple the output of second cluster
120
to the input of second cluster
120
, for use in second cluster
120
, prior to commitment in retirement unit
145
.
In operation, instruction fetch
105
retrieves instructions from memory
100
and delivers the instructions to scheduler
110
. Scheduler
110
attempts to route instructions to first cluster
115
and second cluster
120
in a way that provides high utilization of execution units
125
,
130
,
135
, and
140
. Unfortunately, when a read instruction is executed in second cluster
120
after a write instruction was executed in first cluster
115
, the results of the write instruction are not immediately available to the read instruction, since the results of the write instruction must be fed back to second cluster
120
from the architectural state in retirement unit
145
via write back bus
160
.
For these and other reasons there is a need for the present invention.
SUMMARY OF THE INVENTION
In one embodiment an apparatus for routing computer instructions comprises a plurality of queues to buffer instructions to a plurality of clusters, a chain affinity unit to store information, and a dispersal unit to route instructions to the plurality of queues based on information to be stored in the chain affinity unit.
REFERENCES:
patent: 5202975 (1993-04-01), Rasbold et al.
patent: 5699537 (1997-12-01), Sharangpani et al.
patent: 5884061 (1999-03-01), Hesson et al.
patent: 0767425 (1997-04-01), None
patent: 2322718 (1998-09-01), None
patent: 95/09394 (1995-04-01), None
Palacharla, S., et al., “Complexity-Effective Superscalar Processors”,Ann. Int'l Symp. on Computer Architecture, vol. CONF 24,New York, pp. 206-218, (1997).
Arora Ken
Corwin Michael P.
Mulder Hans
Sharangpani Harshvardhan
Ellis Richard L
Patel Gautam R
Schwegman Lundberg Woessner & Kluth P.A.
LandOfFree
Method and apparatus for efficiently routing dependent... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for efficiently routing dependent..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for efficiently routing dependent... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2862945