Method and apparatus for reducing encoding needs and ports...

Electrical computers and digital processing systems: processing – Instruction decoding – Decoding instruction to accommodate variable length...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06704855

ABSTRACT:

BACKGROUND OF RELATED ART
1. Field of the Invention
The present invention relates to a method and apparatus for reducing encoding needs and reducing the number of ports to shared resources in a multi-operation (wide-issue) processor, and more particularly to a mechanism based on a set of identifier fields which are shared among operations (the consumers of a shared resource).
2. Description of Related Art
Wide-issue processors are characterized by their ability to specify multiple “operations” that are carried out simultaneously and which may share certain resources in the processor. This set of operations, or “packet,” can be created either when the program is generated (static generation by a programmer, compiler or other means), or by some mechanism invoked while the operations are carried out (dynamic generation, for example, performed at the time instructions are fetched from main memory into an instruction cache or instruction buffer, or at the time when instructions are decoded, or in some other stage in the processor pipeline).
Typically, the format of the multiple operations specified in a packet
100
contains a separate field for identifying the arguments used by each one of the operations, which are extracted from a collection of shared resources (for example, the various registers in a buffer or register file), as illustrated in FIG.
1
. Furthermore, each of the identifier fields
111
,
112
and
113
is associated with an independent port to access the shared resource, so that there is no conflict among the different operations
121
-
124
for accessing the shared resource. As a result, the number of ports to a shared resource needed in an implementation corresponds to the maximum number of identifiers that can be encoded in a packet
100
. This format of a packet
100
is the approach used to specify the registers used by the primitive operations in Very-Long Instruction Word (VLIW) processors such as TRACE, CYDRA 5, ITANIUM, Phillips TRIMEDIA, among others. This format is also the approach implicitly used in processors which dynamically construct long-instructions such as those described in U.S. Pat. No. 5,442,760 and Franklin, M. & Smotherman, M.,
A Fill-unit Approach to Multiple Instruction Issue,
Proceedings of the 27th International Conference on Microarchitecture, 1994, at 162-171.
However, a disadvantage of the above packet format is that, for packets with many primitive operations, large shared structures result from having independent ports to a shared resource for each operation. Moreover, some primitive operations actually use fewer than the maximum possible number of arguments or results. For example, a register-to-register primitive operation such as add or subtract uses three register fields and consequently three ports in a register file: two read ports to access the operands, and one write port to save the result of the operation. On the other hand, a load operation specifying a base register and a displacement uses only one read and one write port in the register file, whereas a store operation does not use a write port.
Therefore, a need exists for a method and system having efficient use of identifier fields for specifying arguments accessed in the shared resource.
Attempts have been made to reduce the number of ports to the register file in a wide-issue processor. One such attempt is the Power2 processor, available commercially from IBM, Inc., it provides the number of ports needed by replicating the register file. More specifically, the fixed-point execution unit contains two register files with 4-read and 4-write ports; each of two functional units reads operands from one of the register files, but write ports are common to both register files. In other words, read ports are distributed across the register files whereas write ports are replicated in both modules.
In the context of VLIW processors, providing the needed ports in the register file has been addressed by the use of partitioned register files. Registers and ports are distributed across different modules, and data are either moved or copied among the modules through the execution of specific instructions, as in TRACE and Cydra 5. A variation on this approach includes replicating registers throughout some of the modules so that read ports are distributed and write ports are replicated across the corresponding modules.
U.S. Pat. No. 5,129,067 describes a group of instructions (primitive operations), fetched from the cache memory, potentially in some predecoded state. The patent is based on arbitration logic to dynamically resolve contention for the ports to the register file. Generally, the patent provides (1) arbitration logic for arbitrating conflicts among the operations in accessing the register file, based on arbitration data corresponding to each of the operations, and (2) a multiplexing unit for selectively supplying the N register identifiers to the M available ports in response to control signals generated by the arbitration logic. More specifically, the patent addresses the problem of long-instructions with N register-operand identifiers on a processor having M ports to the register file, wherein M<N; the values of N and M considered in the embodiments described are 4-8 and 2-4, respectively. Such an approach is not adequate for the case of executing many primitive operations simultaneously (N>8), as is the trend nowadays, due to the exponentially increasing hardware complexity involved; in addition, the delay across the arbitration logic grows very fast for a larger number of possible operands.
A solution related to the one proposed in U.S. Pat. No. 5,129,067 (described above) is further developed by Johnson, M., Superscalar Microprocessor Design, (Prentice Hall 1991), indicating that a four-operation decoder suffers minor degradation when there are only four read ports in the register file. The publication also relates to a superscalar processor, for the case of a four-operation decoder. The scheme proposes a long-instruction format with a separate register access field which specifies the register identifiers for four source operands and four destination registers. Destination-register identifiers are positionally assigned to each operation, so the operations do not need to identify their corresponding destination register. On the other hand, each operation identifies source operands by selecting among the source-register identifiers and destination-register identifiers in the register access field. This scheme also allows identifying the destination register of one operation as a source register of another operation (in left-to-right order).
The solution proposed by Johnson, M., Superscalar Microprocessor Design, supra, has as many destination-register identifiers as primitive operations, so that the associated ports and fields are used inefficiently whenever there is an operation in the long-instruction which does not generate a result to be placed in the register file (such as a store operation, or some forms of compare operations which place the result in a condition register instead of the register file). Moreover, any of the register-identifiers in the register access field is used as source for any of the operations in the long instruction, leading to a rather complex network for routing operands from the register file to the functional units. This aspect is briefly mentioned by the Johnson publication, but no solution for it is described.
Partitioned register files have been addressed by Colwell, Robert P., et al.,
A VLIW Architecture for a TRACE Scheduling Compiler,
Proceedings of the Second International Conference on Architectural Support for Programming Languages and Operating Systems, 1987, at 180-192, and by Beck, G., Yen, D., & Anderson, T.,
The Cydra
5
Minisupercomputer: Architecture and Implementation,
The Journal of Supercomputing, 1993, Vol. 7 at 143-180. The partitioned register file used by Colwell et al. and by Beck, Yen, & Anderson, is a feasible solution regarding the implementation of a register file with many ports. However, su

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for reducing encoding needs and ports... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for reducing encoding needs and ports..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for reducing encoding needs and ports... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3219445

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.