Extended operand management indicator structure and method

Electrical computers and digital processing systems: processing – Instruction decoding – Decoding instruction to accommodate plural instruction...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S217000, C712S218000

Reexamination Certificate

active

06615340

ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates to computers and computer system central processing units especially as to methods and structure for handling and storage of operand values.
DESCRIPTION OF PRIOR ART
Central processing units for server computers, workstations and personal computers today typically employ superscalar microarchitectures that will likely be employed in embedded processors as well. These machines aim to execute more than one instruction in each clock cycle. To increase their execution rates many of the present-day designs execute instructions out-of-order. They search through the predicted stream of upcoming instructions to find those that can be started. In such out-of-order schemes an instruction needing results that will be generated by another, incomplete instruction can be deferred in favor of instructions whose input operands are ready for immediate use.
Contemporary computer designs often include means for renaming operands (especially register operands) so that instructions needing the latest value of an operand being computed by a previously issued instruction can access the immediate output of that previous instruction instead of waiting for the value to be stored (e.g. into a register file) and re-fetched. These prior art computer processors fail to gather much information about operands and the flow of operand values and then discard all or most of the information they do gather.
Continuing development of electronics technology has been allowing designers to incorporate more circuits into each computer processor. With ever more circuits it would be advantageous to be able to issue more instructions in each clock cycle but several barriers are encountered in attempting straightforward extensions to today's superscalar methods. To manage more in-process instructions requires larger windows of instructions from which to choose and the complexity of larger windows increases faster than increases in window size.
Among the most severe barriers to increasing the number of instructions executed per clock cycle is multiporting of operand storage. Operands are stored in register files and in memory units. To make up for the relative slowness of large-scale memory units such as DRAMs, faster but smaller cache memories are connected to central processing units. A typical computer instruction might have two input operands and one output operand. Executing such an instruction would require at least three operand accesses. Executing four instructions in each clock cycle would typically require twelve operand accesses. These operands would be typically spread between register storage (“a register file”) and data cache storage. Each simultaneous access to a storage unit will require a read or write port. Unfortunately, the number of components required to construct access ports grows much faster than the number of ports supplied. Doubling the number of ports to the register file might require quadrupling the number components devoted to access ports. Increasing the number of ports is also likely to increase the access time to the operand storage being ported. “For a register file, doubling the number of ports doubles the number of wordlines and bitlines (quadrupling the register file area in the limit . . . ” [Farkas, Keith I., Norman P. Jouppi and Paul Chow “Register file design considerations in dynamically scheduled processors”, p. 18 WRL Research Report 95/10, Digital Western Research Laboratory, Palo Alto]
My U.S. Pat. No. 5,974,538 explains how computer instruction operands can be annotated as to their source (value creating) instructions and operand flows to receiving (target) instructions can then be mapped. This is shown in
FIG. 1
where an output
106
of an instruction
101
has been annotated with the address of instruction
101
. Subsequent use of operand
106
(R
1
) by an instruction
103
causes creation of a flow mapping
104
indicating that output of instruction
106
will flow to instruction
103
. Subsequent executions of source instruction
101
, whose flow has been mapped, can initiate forwarding of operands to target instructions so that they may make use of them as inputs and can trigger those receiving instructions to begin execution earlier than would occur in sequential execution of the same program code.
FIG.
2
A and
FIG. 2B
show the mapping storage structure given in U.S. Pat. No. 5,974,538.
FIG. 2A
shows mapping information stored in a linked list data structure while
FIG. 2B
shows a hashed structure with linked overflow. A design might, in that disclosure, sometimes choose to omit mapping some flows from source instructions to operand target instructions and, where flows have been mapped, a machine might operate in speculative mode where operands are forwarded to target instructions before all intervening branch paths from source instruction to target instruction have been resolved.
U.S. Pat. No. 5,974,538 also discusses the use of a Temporary Result Cache [C29 L40] to decrease traffic to architected storage locations for values that will soon be overwritten with newer values. This cache is, however, concerned with holding outputs of instructions until those values have been superceded to avoid materializing them. It is Not Concerned with holding operand values to be forwarded to other instructions as discussed here.
A similar scheme was put forth for memory operands in a paper by Moshovos and Sohi [Moshovos, Andreas and Gurindar Sohi, “Streamlining inter-operation memory communication via data dependence prediction”, Proc. Micro-30, December, 1997, IEEE]. In that scheme dependences on memory operands are first detected by annotating memory operands in a small annotations file called a Dependence Detection Table that records the program counter (address) of the last instruction to last touch each recorded address. Part of the system of Moshovos and Sohi is depicted in
FIG. 3. A
store instruction
302
stores a new value in a memory hierarchy
301
at a storage address
303
. The value at that storage address is later used as input by a load instruction
304
that loads the value to a register from where it may be used by other subsequent instructions. The passing of a value from a store instruction to a load instruction causes creation of an association between those instructions that is stored in an association record
307
in a dependence prediction and naming table
312
. Later execution of the store instruction will create an entry
311
in a synonym file
310
. When dependent load instruction
304
is issued it can obtain its value from the synonym file instead of having to wait for a memory operand address calculation and cache or memory access to complete. Moshovos and Sohi also describe a transient value cache that can hold values likely to be killed (overlaid) soon with new values for the same address. The methods of Moshovos and Sohi are intended for use in a speculative superscalar machine in which instructions are Speculatively issued before all preceding conditional branch instructions have been resolved. The machine will, at times, execute down the wrong path of instructions and have to abandon some of its results. Dependences recorded in the synonym associations of Moshovos and Sohi also include a Predictor that is used to determine whether forwarding should occur between a given pair of instructions. The Predictor described in Moshovos and Sohi is intended only to reflect the likelihood of operand dependence between the instructions. That paper proposed predictors not just of dependencies from store instructions to load instructions but also between load instructions. An instruction that loads a value from a given memory location might be followed by another load instruction that loads the contents of that same memory location. There is then a read-after-read (RAR) dependence between the two load instructions and a system can take advantage of that dependence to bypass the expense of the second load. The Predictor is still concerned with predicting true depend

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Extended operand management indicator structure and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Extended operand management indicator structure and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Extended operand management indicator structure and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3056797

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.