System and method for high-speed register renaming by counting

Electrical computers and digital processing systems: processing – Processing architecture – Superscalar

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S216000, C712S218000

Reexamination Certificate

active

06212619

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems, and more specifically to an improved method of utilizing registers in a processing unit of a computer, particularly wherein the computer has a superscalar architecture.
2. Description of Related Art
The basic structure of a conventional computer system includes one or more processing units connected to various input/output devices for the user interface (such as a display monitor, keyboard and graphical pointing device), a permanent memory device (such as a hard disk, or a floppy diskette) for storing the computer's operating system and user programs, and a temporary memory device (such as random access memory or RAM) that is used by the processor(s) in carrying out program instructions. The evolution of computer processor architectures has transitioned from the now widely-accepted reduced instruction set computing (RISC) configurations, to so-called superscalar computer architectures, wherein multiple and concurrently operable execution units within the processor are integrated through a plurality of registers and control mechanisms.
The objective of superscalar architecture is to employ parallelism to maximize or substantially increase the number of program instructions (or “micro-operations”) simultaneously processed by the multiple execution units during each interval of time (processor cycle), while ensuring that the order of instruction execution as defined by the programmer is reflected in the output. For example, the control mechanism must manage dependencies among the data being concurrently processed by the multiple execution units, and the control mechanism must ensure that integrity of sequentiality is maintained in the presence of precise interrupts and restarts. The control mechanism preferably provides instruction deletion capability such as is needed with speculatively executing instruction-defined branching operations, yet retains the overall order of the program execution. It is desirable to satisfy these objectives consistent with the further commercial objectives of minimizing electronic device count and complexity. The prevailing convention in the context of superscalar architectures is to reduce the size and content of the registers and the bit size of the words used for control and data transmission among the circuits. Additional information on superscalar designs is disclosed in U.S. Pat. No. 5,481,683.
An illustrative embodiment of a conventional processing unit is shown in
FIG. 1
, which depicts the architecture for a PowerPC™ microprocessor
12
manufactured by International Business Machines Corp. Processor
12
operates according to reduced instruction set computing (RISC) techniques, and is a single integrated circuit superscalar microprocessor. The system bus
20
is connected to a bus interface unit (BIU)
30
of processor
12
. Bus
20
, as well as various other connections described, include more than one line or wire, e.g., the bus could be a 32-bit bus. BIU
30
is connected an instruction cache
32
and a data cache
34
. The output of instruction cache
32
is connected to a sequencer unit
36
. In response to the particular instructions received from instruction cache
32
, sequencer unit
36
outputs instructions to other execution circuitry of processor
12
, including six execution units, namely, a branch unit
38
, a fixed-point unit A (FXUA)
40
, a fixed-point unit B (FXUB)
42
, a complex fixed-point unit (CFXU)
44
, a load/store unit (LSU)
46
, and a floating-point unit (FPU)
48
.
The inputs of FXUA
40
, FXUB
42
, CFXU
44
and LSU
46
also receive source operand information from general-purpose registers (GPRs)
50
and fixed-point rename buffers
52
. The outputs of FXUA
40
, FXUB
42
, CFXU
44
and LSU
46
send destination operand information for storage at selected entries in fixed-point rename buffers
52
. CFXU
44
further has an input and an output connected to special-purpose registers (SPRs)
54
for receiving and sending source operand information and destination operand information, respectively. An input of FPU
48
receives source operand information from floating-point registers (FPRs)
56
and floating-point rename buffers
58
. The output of FPU
48
sends destination operand information to selected entries in floating-point rename buffers
58
. Processor
12
may include other registers, such as configuration registers, memory management registers, exception handling registers, and miscellaneous registers, which are not shown. Processor
12
carries out program instructions from a user application or the operating system, by routing the instructions and data to the appropriate execution units, buffers and registers, and by sending the resulting output to the system memory device (RAM), or to some output device such as a display console.
Register sets such as those described above limit superscalar processing, simply due to the number of registers that are available to a particular execution unit at the beginning of instruction execution (i.e., the registers must be shared among the different execution units). Moreover, superscalar operations are typically “pipelined,” that is, a plurality of processing stages are provided for a given execution unit, with each stage able to operate on one instruction at the same time that a different stage is operating on another instruction, so the registers must be further shared. The problem is exacerbated when a long sequence of instructions requires access to the same register set. Furthermore, programmers often use the same registers as temporary storage registers rather than moving data to and from system memory (since the latter process takes a large amount of time relative to processor speed), so a small register set can cause a “bottleneck” in the performance stream.
Techniques have been devised for expanding the effective number of available registers, such as by providing register renaming. Register renaming provides a larger set of registers by assigning a new physical register every time a register (architected) is written. See “Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers,” IEEE Transactions on Computers, vol. 39, no. 3 (March 1990). A physical register is released for re-use when an instruction that overwrites the architected state maintained in that register completes.
The mapping from architected to physical registers can be maintained as a table with one entry per architected register, or as a table with one entry per physical register. The former is often referred to as the “RAM approach” whereas the second is referred to as the “CAM” (content addressable memory) approach. The second scheme requires associative lookup when an architected register tag (address) is to be mapped to a physical register tag. Both the CAM and RAM schemes are presently in use. See “Quantifying the Complexity of Superscalar Processors,” S. Palacharla, N. Jouppi, J. Smith, Technical Report CS-TR-96-1328, University of Wisconsin-Madison, November 1996. The remainder of this discussion focuses on the CAM scheme, because it provides the most natural context for the present invention.
Presently available high-performance techniques for register renaming require maintaining a free list of physical register tags which address physical registers that are not in use. The free list is a separate structure, not very well matched to the rest of the register renaming circuitry. Also, renaming multiple instructions in a single cycle requires multiple read and write ports on the free list, which tends to make it both big and slow. For these reasons it would be desirable to devise a method of generating tags of physical registers that are not in use that better supports a high-performance implementation, as in a high-speed microprocessor. It would further be desirable if the mechanism could be combined with a high-performance area-efficient mechanism for checkpointing the state of a microprocessor.
SUMMARY OF TH

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for high-speed register renaming by counting does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for high-speed register renaming by counting, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for high-speed register renaming by counting will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2546747

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.