Register file having shared and local data word parts

Electrical computers and digital processing systems: processing – Processing architecture – Superscalar

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S010000, C712S011000, C712S015000, C712S019000

Reexamination Certificate

active

06219777

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to a register file used in a multiprocessor and the like.
BACKGROUND OF THE INVENTION
Along with an increase in integration density, the quantity of hardware, such as operation units, that can be mounted in a processor is increased. In a processor that several operations can be in parallel executed, such as a superscalar processor and a VLIW (very long instruction word) processor, several operation units are in parallel driven to enhance the processing performance. However, to maintain the parallel processing performance in such kind of processors, a register file with the multi-port structure that allows to be simultaneously supplied with data and to be simultaneously written of a result of operation according to the number of operation units driven simultaneously is required.
For example, R10000, a superscalar processor made by MIPS corp. employs a register file for integer operation that has 10 ports (7 read ports and 3 write ports) to enable the parallel execution of four instructions (two integer-operation instructions, one load/store instruction and one branch instruction).
When several superscalar processor elements can be mounted due to a further enhanced integration density, a mechanism that enables the high-speed access to common data between the processor elements is necessary to maintain the parallel processing performance. In this regard, a system that common data are left on a register file, without storing in a cache or main storage, to allow several processor elements to access them is effective. Such a system can be realized by increasing the number of ports of the register file, like the case of the superscalar processor.
FIG. 1
shows an example of a processor with four superscalar processor elements that can execute in parallel two operation instructions. Referring to
FIG. 1
, when all processor elements
601
to
604
use commonly data stored in a register file
605
, the register file
605
has only to have 20 ports (16 for reading and 8 for writing) at the maximum because two operation units in each processor use two read ports and one write port of the register file
605
.
In contrast with this, by restricting a register accessible from each of instruction to be in parallel executed, the number of ports of a register file can be decreased while maintaining the number of instruction to be in parallel executed.
FIG. 2
shows an example of a VLIW machine. Referring to
FIG. 2
, a instruction group
701
of four instructions executable in parallel is divided into two instruction groups
702
,
703
, each of which is of two instructions, and register files
704
,
705
are assigned separately to processor elements
712
,
713
to process these instruction groups. The instruction group
702
executes the operation by using operation units
706
,
707
and accesses the register file
704
. Similarly, the instruction group
703
executes the operation by using operation units
709
,
710
and accesses the register file
705
.
When the processor element
713
uses data stored in the register file
704
, the data are transferred from the register file
704
through a selector
711
to the register file
705
. The selector
711
is controlled to select the output result from the operation unit
710
for an ordinary operation instruction, and it is controlled to select the output of the register file
704
when the inter-register transfer instruction is executed. In like manner, a selector
708
is controlled by the inter-register transfer instruction from the register file
705
to the register file
704
.
In such a composition, a register file with 6 ports (4 read ports and 2 write ports) has only to be provided for each instruction group (each processor element). Namely, the register file has only to have ports half as many as 12 ports (8 read ports and 4 write ports) required in the case that all the four instructions use commonly one register file.
For example, Japanese patent application laid-open No.5-233281(1993) discloses a high-performance calculator that enhances the separation between processor elements and facilitates the chip layout, by using such a technique.
In the composition shown in
FIG. 1
, the processor element can easily use common data with the other processor element and rapidly access data produced by the other processor element. However, in this composition, there is a problem that a scalable enhancement in performance cannot be obtained because the port number of register file, i.e., its delay and area, is increased with the number of operation units mounted on the processor element. Also, for a program, such as a program for image processing, that has a high instruction-independency and data-localization and uses few common data between processor elements, it is useless since the port number is more than is needed.
On the other hand, in the composition shown in
FIG. 2
, the port number of register file can be reduced, but it needs the operation to transfer data between register files when data to be used exists in a register file assigned to another processor element. This operation is conducted by the inter-register transfer instruction, therefore causing an overhead and thereby damaging high-speed access.
SUMMARY OF THE INVENTION
Accordingly, it is an object of the invention to provide a register file that an increase in port number due to parallel processing can be prevented and an overhead in accessing common data between processor elements can be suppressed.
According to the invention, provided is a register file used in a multiprocessor composition composed of a plurality of processor elements, the register file having a plurality of words and being provided for each of the plurality of processor elements, wherein: the plurality of words are divided into a word part that can be simultaneously accessed by some of the plurality of processor elements to use in common with other processor element, and a word part that can be accessed only by its own processor element.
In this invention, the following effects can be obtained.
First, when several processor elements have a register file in common, the register file does not need to be provided with ports required to access simultaneously from all the processor elements. Therefore, an increase in area and delay with an increase in the number of ports can be prevented.
The reason is as follows: For example, when four processor elements, each of which includes two operation units with two inputs and one output, have a register file used for their operations in common, the number of ports required for writing and reading is 20. In contrast with this, a register file of this invention that only two adjacent processor elements have part of mutual register files in common has only to have 12 ports for the common part and 6 ports for the non-common part. For example, in
FIG. 3
, 6 ports (4 read ports and 2 write ports) for R
8
to R
23
and 12 ports (8 read ports and 4 write ports) for R
0
to R
7
and R
24
to R
31
are needed. Thus, the port number can be significantly decreased and the area and delay can be therefore reduced as well. This effect can be obtained regardless of the number of operation units and processor elements.
Second, when several processor elements access common data, it is not necessary to execute a specific data transfer operation by software between register files assigned to the processor elements. Namely, an overhead in accessing common data can be removed.
This is because part of a register file owned by a processor element is used in common with part of a register file owned by another processor element and a memory cell as the common part is provided with ports that can be simultaneously accessed by the two processor elements.
Third, in accessing a local register part of register file from a processor element, deterioration in performance due to decrease of register can be prevented.
This is because false dependence relations (reverse-dependence or output-dependence) caused by using the register in common can be redu

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Register file having shared and local data word parts does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Register file having shared and local data word parts, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Register file having shared and local data word parts will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2497648

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.