Superscalar RISC instruction scheduling

Electrical computers and digital processing systems: processing – Processing architecture – Superscalar

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S218000, C712S216000

Reexamination Certificate

active

06289433

ABSTRACT:

CROSS-REFERENCE TO RELATED APPLICATIONS
The following are commonly owned applications:
“Semiconductor Floor Plan and Method for a Register Renaming Circuit”, Ser. No. 04/860,718, concurrently filed with the present application, now U.S. Pat. No. 5,371,684;
“High Performance RISC Microprocessor Architecture”, Ser. No. 07/817,810, filed Jan. 8, 1992 now U.S. Pat. No. 5,539,911;
“Extensible RISC Microprocessor Architecture”, Ser. No. 07/817,809, filed Jan. 8, 1992, now abandoned.
The disclosures of the above applications are incorporated herein by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to superscalar reduced instruction set computers (RISC), more particularly, the present invention relates to instruction scheduling including register renaming and instruction issuing for superscalar RISC computers.
2. Related Art
A more detailed description of some of the basic concepts discussed in this application is found in a number of references, including Mike Johnson, Superscalar Microprocessor Design (Prentice-Hall, Inc., Englewood Ciffs, N.J., 1991); John L. Hennessy et. al., Computer Architecture—A Quantitative Approach” (Morgan Kaufmann Publishers, Inc., San Mateo, Calif., 1990). Johnson's text, particularly Chapters 2, 6 and 7 provide an excellent discussion of the register renaming issues addressed by the present invention.
A major consideration in a superscalar RISC processor is to how to execute multiple instructions in parallel and out-of-order, without incurring data errors due to dependencies inherent in such execution. Data dependency checking, register renaming and instruction scheduling are integral aspects of the solution.
Storage Conflicts and Register Renaming
True dependencies (sometimes called “flow dependencies” or “write-read” dependencies) are often grouped with anti-dependencies (also called “read-write” dependencies) and output dependencies (also called “write-write” dependencies) into a single group of instruction dependencies. The reason for this grouping is that each of these dependencies manifests itself through use of registers or other storage locations. However, it is important to distinguish true dependencies from the other two. True dependencies represent the flow of data and information through a program. Anti- and output dependencies arise because, at different points in time, registers or other storage locations hold different values for different computations.
When instructions are issued in order and complete in order, there is a one-to-one correspondence between registers and values. At any given point in execution, a register identifier precisely identifies the value contained in the corresponding register. When instructions are issued out of order and complete out of order, correspondence between registers and values breaks down, and values conflict for registers. This problem is severe when the goal of register allocation is to keep as many values in as few registers as possible. Keeping a large number of values in a small number of registers creates a large number of conflicts when the execution order is changed from the order assumed by the register allocator.
Anti- and output dependencies are more properly called “storage conflicts” because reusing storage locations (including registers) causes instructions to interfere with one another even though conflicting instructions are otherwise independent. Storage conflicts constrain instruction issue and reduce performance. But storage conflicts, like other resource conflicts, can be reduced or eliminated by duplicating the troublesome resource.
Dependency Mechanisms
Johnson also discusses in detail various dependency mechanisms, including: software, register renaming, register renaming with a reorder buffer, register renaming with a future buffer, interlocks, the copying of operands in the instruction window to avoid dependencies, and partial renaming.
A conventional hardware implementation relies on software to enforce dependencies between instructions. A compiler or other code generator can arrange the order of instructions so that the hardware cannot possibly see an instruction until it is free of true dependencies and storage conflicts. Unfortunately, this approach runs into several problems. Software does not always know the latency of processor operations, and thus, cannot always know how to arrange instructions to avoid dependencies. There is the question of how the software prevents the hardware from seeing an instruction until it is free of dependencies. In a scalar processor with low operation latencies, software can insert “no-ops” in the code to satisfy data dependencies without too much overhead. If the processor is attempting to fetch several instructions per cycle, or if some operations take several cycles to complete, the number of no-ops required to prevent the processor from seeing dependent instructions rapidly becomes excessive, causing an unacceptable increase in code size. The no-ops use a precious resource, the instruction cache, to encode dependencies between instructions.
When a processor permits out-of-order issue, it is not at all clear what mechanism software should use to enforce dependencies. Software has little control over the behavior of the processor, so it is hard to see how software prevents the processor from decoding dependent instructions. The second consideration is that no existing binary code for any scalar processor enforces the dependencies in a superscalar processor, because the mode of execution is very different in the superscalar processor. Relying on software to enforce dependencies requires that the code be regenerated for the superscalar processor. Finally, the dependencies in the code are directly determined by the latencies in the hardware, so that the best code for each version of a superscalar processor depends on the implementation of that version.
On the other hand, there is some motivation against hardware dependency techniques, because they are inherently complex. Assuming instructions with two input operands and one output value, as holds for typical RISC instructions, then there are five possible dependencies between any two instructions: two true dependencies, two anti-dependencies, and one output dependency. Furthermore, the number of dependencies between a group of instructions, such as a group of instructions in a window, varies with the square of the number of instructions in the group, because each instruction must be considered against every other instruction.
Complexity is further multiplied by the number of instructions that the processor attempts to decode, issue, and complete in a single cycle. These actions introduce dependencies. The only aid in reducing complexity is that the dependencies can be determined incrementally, over many cycles to help reduce the scope and complexity of the dependency hardware.
One technique for removing storage conflicts is by providing additional registers that are used to reestablish the correspondence between registers and values. The additional registers are conventionally allocated dynamically by hardware, and the registers are associated with values needed by the program using “register renaming.” To implement register renaming, processors typically allocate a new register for every new value produced (i.e., for every instruction that writes a register). An instruction identifying the original register, for the purpose of reading its value, obtains instead the value in the newly allocated register. Thus, hardware renames the original register identifier in the instruction to identify the new register and correct value. The same register identifier in several different instructions may access different hardware registers, depending on the locations of register references with respect to register assignments.
Consider the following code sequence where “op” is an operation, “Rn” represents a numbered register, and “:=” represents assignment:
R
3
b
:=R
3
a
op R
5
a
  (1)
R
4
b
:=R
3
b
+1  (

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Superscalar RISC instruction scheduling does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Superscalar RISC instruction scheduling, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Superscalar RISC instruction scheduling will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2521781

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.