Electrical computers and digital processing systems: processing – Processing architecture – Distributed processing system
Reexamination Certificate
2000-07-03
2003-09-30
Coleman, Eric (Department: 2183)
Electrical computers and digital processing systems: processing
Processing architecture
Distributed processing system
C712S023000
Reexamination Certificate
active
06629232
ABSTRACT:
TECHNICAL FIELD OF THE INVENTION
The present invention relates to electronic data processing, and more specifically concerns an organization for general-purpose register files in superscalar or very long instruction word (VLIW) processor architectures having a large number of execution units connected to the same registers.
BACKGROUND OF THE INVENTION
Semiconductor process trends indicate that transistor gate delays are decreasing at a rate significantly faster than signal-transmission delays through the conductors joining the transistors. As a result, the cycle time of the next generation of microprocessor chips will be increasingly limited by interconnection hardware structures, rather than by transistor structures as in the past.
One important structure required by all microprocessors is a file of general-purpose or architectural registers. Register files in modern processors, especially those in superscalar, VLIW, and other regularized architectures, are dominated both in timing and in chip area by the metal interconnections required for data and address lines. This situation becomes even worse because of the increasing parallel-execution width of present and future designs—because of the larger number of instructions that can be executed in parallel. The importance of interconnect area and delay in large regular structures such as register files has not been appreciated in the past.
Some approximations employed in the industry characterize chip real estate by a small set of parameters: the number of registers in a register file, the size (number of bits) of each register, and the number of ports in the register file, usually three or four times the execution width of the processor. Parallel-execution width depends upon the particular computer technology, but wider is better to exploit instruction parallelism. The number of bits in each register is dictated by architectural considerations. The area of a large, metal-limited register file increases roughly linearly with the number and size of the registers, but rises much faster with the number of ports. The latency or delay time of a register file is also roughly proportional to the number of ports. That is, the large register files required by modern architectures and allowed by new transistor technology reach a state of diminishing returns with respect to the number of ports in a register file.
In order to obtain maximum benefit from the latest semiconductor processes, which speed up transistors more than interconnects, microprocessor designers desire to limit performance by transistor-dominated structures rather than by metal-dominated ones. That is, the register file must be taken off the critical path that limits the performance of the entire processor. The desire for wider machines with increased parallelism, however, exacerbate the register-file problem by growing the register file much more than linearly. Thus, there is a pressing need for highly ported register files that are less dominated by their interconnection area and latency time.
SUMMARY OF THE INVENTION
The invention employs multiple copies of a register file in a processor having a number of execution units that access the register file. Each group of execution units can read from and write to its own copy of the file registers by a set of local read and write ports. In addition, all of the register-file copies are synchronized by writing data to remote write ports in the other copies of the register file. The interconnections between the execution units and the register-file copies thus grow less rapidly than they otherwise would, and the difference becomes greater as the execution width of the machine increases.
In one embodiment, not all of the registers are writable by the remote write ports. Each file copy is divided into local and global registers. While all copies of the global registers continue to be written by the remote write ports, the local registers can be written only by a local cluster of execution units. Other embodiments divide the registers into global and local according to other criteria.
REFERENCES:
patent: 5301340 (1994-04-01), Cook
patent: 5574939 (1996-11-01), Keckler et al.
patent: 5644780 (1997-07-01), Luick
patent: 5826096 (1998-10-01), Baxter
patent: 6219777 (2001-04-01), Inoue
patent: 6282585 (2001-08-01), Batten et al.
Arora Ken
Gupta Rajiv
Sharangpani Harshvardhan
Coleman Eric
Intel Corporation
Schwegman Lundberg Woessner & Kluth P.A.
LandOfFree
Copied register files for data processors having many... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Copied register files for data processors having many..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Copied register files for data processors having many... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3052323