Electrical computers and digital processing systems: processing – Instruction issuing
Reexamination Certificate
1999-05-26
2002-05-21
Ellis, Richard L. (Department: 2653)
Electrical computers and digital processing systems: processing
Instruction issuing
C712S213000, C712S241000
Reexamination Certificate
active
06393551
ABSTRACT:
FIELD OF THE INVENTION
The present invention pertains to computing systems and the like. More specifically, the present invention relates to reducing the number of instruction transactions in a microprocessor.
BACKGROUND OF THE INVENTION
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. Conversely, superpipelined microprocessors include a large number of pipeline stages for executing an instruction which is typically carried out by a number of steps that, in turn, are subdivided into the most equal possible substeps. Therefore, in order to execute an instruction completely, all of the substeps must be executed sequentially.
FIG. 1
illustrates a conventional executable instruction dataword
100
. The instruction datawora
100
is typically formed of an opcode field
102
, an operand specifier A and an associated operand specifier B. An execution result specifier field C is used to store the result of the executed instruction. The opcode field
102
defines the specific operation to be performed on the operands A and B. Typical operations include, for example, addition, multiplication, branching, looping, and shifting. The result of such an operation is stored in the execution result data word C that is then made available for subsequent executable instructions.
FIG. 2A
illustrates a conventional computing system
200
arranged to perform desired calculations based upon a user supplied program. The typical program used by the computing system
200
is generally formed of an ordered list of executable instructions, referred to as code, each of which is associated with a particular program counter (PC). The computing system
200
includes a programming memory
202
configured to store the executable instructions that form the program at memory locations corresponding to the program counters. Typically, the programming memory
202
is connected to a central processing unit (CPU)
204
by way of a bi-directional programming bus
206
. The program counters are, in turn, used to point to the location within the memory
202
at which the corresponding executable instruction is stored.
By way of example, a typical instruction
220
executed by the computer system
200
is shown in FIG.
2
B. The line of code
220
includes a program counter (PC) that points to a location
10
in the programming memory
202
where the instruction (composed of the opcode ADD and the respective operand specifiers
20
and
30
) to be executed is to be found. In this case, the CPU
204
will add the values stored in locations
20
and
30
and store the result in memory location
100
as indicated by the RESULT specifier field.
Referring again to
FIG. 2A
, conventional instruction processing generally includes decoding the instruction, executing the instruction, and storing the execution result in the memory location in the programming memory
202
or in a register of the register file identified by the instruction. More specifically, during what is referred to as a fetch stage, the instruction
220
is fetched by a fetching unit
208
from the memory
202
based upon the memory address indicated by the program counter. At this point the fetching unit
208
parses the instruction
220
into the opcode data field
102
and the operand specifiers A and B. The opcode data field
102
is then conveyed by way of an issued instruction bus
210
to a decoder
212
. Meanwhile, the operands at A and B are read from the register file
214
.
Once received, the decoder
212
uses the opcode data field
102
to select a function unit (FU) such as, for example, function unit
216
, arranged to perform the function corresponding to the opcode included in the opcode data field
102
. By way of example using the line of code above, the FU
216
is an arithmetic logic unit (ALU) arranged to perform an ADDing operation upon the respective operands A and B stored in the register file
214
. At this point, the FU
216
is ready to execute the instruction. It should be noted that the FU
216
can, in fact, be any appropriate function unit capable of executing the function indicated by the instruction opcode. Such functions include, for example, ADDing, as with an arithmetic logic unit (ALU), shifter, multiplier, etc. Once executed, the FU
216
outputs the execution result to the destination specified in C to the register file
214
where it is stored until such time as the value C is passed to the memory.
Operations related to the accessing instructions within the programming memory
202
is a major factor limiting the overall performance of the computing system
200
, and more particularly the performance of the CPU
204
. Such situations occur, for example, with large memories having long data access times, or in cases where the memory
202
is remotely located from the CPU
204
incurring long transmission delays. In these cases, the performance of the CPU
204
measured in the number of instructions executed per second is limited by the ability to retrieve instructions from the programming memory
202
.
Conventional approaches to increasing microprocessor performance (i.e., increasing the number of instructions executed per second) includes adding a cache memory
218
for storing instructions. Even though the cache memory
218
is shown to be internal to the memory
202
, it should be noted that the cache memory
218
can also be an external cache memory located outside the main memory
202
in close proximity to the CPU
204
. Typically, the cache memory
218
is accessed more quickly than the memory
202
. Generally, the cache memory
218
stores instructions from the memory
202
in what is referred to as cache lines. A cache line is formed of a plurality of contiguous bytes which are typically aligned such that the first of the contiguous bytes resides at an address having a certain number of low order bits set to zero. The certain number of low order bits is sufficient to uniquely identify each byte in the cache line. The remaining bits of the address form a tag which is used to refer to the entire cache line.
Even though including larger cache memories may increase the performance of the CPU
204
by making instructions readily available, the larger caches have commensurably longer cache access times. Longer cache access times restricts system performance by limiting the number of instructions per second available for the CPU
204
to execute regardless of the inherent clock cycle time of the CPU
204
. As used in this discussion, the term cache access time refers to the interval of time required from the presentation of an address to the cache until the corresponding bytes are available for use by the CPU
204
. As an example, a set associative cache access time includes time for indexing the cache storage, time for comparing the tags to the access address in order to select a row, and time for conveying the selected data from the cache.
Increasing cache access time is particularly deleterious to instruction caches used with high frequency microprocessors. By increasing the cache access time, the bandwidth of the issued instruction bus
210
is substantially reduced, particularly when cache access time becomes longer then the clock cycle time of the CPU
204
.
In view of the foregoing, it should be apparent that increasing instruction issue bus bandwidth without resorting to increasing cache memory size would be desirable.
SUMMARY OF THE INVENTION
An improved system used to improve the performance of a microprocessor is described. More specifically, the system is arranged to increase the number of instructions executed by a microprocessor by selectively storing instructions in a cache memory associated with a corresponding function unit in the microprocessor. In one embodiment of the invention, a method for reducing the number of issued instructions in a computer system is described. In one embodiment, if a fetched instruction program counter (PC) matches a cached instruction tag, t
Chesters Eric
Fleck Rod G.
Mattela Venkat
Singh Balraj
Ellis Richard L.
Infineon Technologies North America Corp.
Patel Gautam R.
LandOfFree
Reducing instruction transactions in a microprocessor does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Reducing instruction transactions in a microprocessor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reducing instruction transactions in a microprocessor will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2903665