Decoding of a register file

Static information storage and retrieval – Addressing – Particular decoder or driver circuit

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C365S230030, C365S230040, C712S208000, C712S212000, C712S001000

Reexamination Certificate

active

06320813

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to parallel processing. More specifically, the present invention relates to the improved decoding of a configurable register file for faster initiation stages of parallel processing.
2. The Background
Parallel Processing involves the execution of multiple processes simultaneously. Numerous types of parallel processing schemes have been utilized, but a common scheme is Very Long Instruction Word (VLIW) schemes. VLIW processors use multiple, independent, functional units to execute the instructions in parallel. Generally, the multiple operations are combined into a single very long instruction. The multiple operations are determined by sub-instructions that are applied to the independent functional units.
A VLIW processor usually uses a technique known as trace scheduling to maintain a code sequence with sufficient operations to keep instructions scheduled by unrolling loops and scheduling code across basic function blocks. Trace scheduling may also improve efficiency by allowing instructions to move across branch points.
FIG. 1
is a schematic diagram illustrating a parallel processor. The processor
50
contains multiple media processor units
52
,
54
. Each media processor unit
52
,
54
includes an instruction cache
56
, an instruction aligner
58
, an instruction buffer
60
, a pipeline control unit
62
, a split register file
64
, a plurality of execution units
66
,
68
,
70
,
72
, and a load/store unit
74
. The media processing units
52
,
54
may use a plurality of execution units for executing instructions. The execution units
66
,
68
,
70
,
72
may include three media functional units (MFU)
66
,
68
,
70
and one general function unit (GFU)
72
. The MFUs
66
,
68
,
70
may be multiple single-instruction-multiple-datapath (MSMID) media functional units. Each of the MFUs
66
,
68
,
70
may be capable of processing 16-bit components. Various parallel 16-bit operations supply the dingle-instruction-multiple-datapath capability including add, multiply-add, shift, compare, and others. The MFUs
66
,
68
,
70
operate in combination as tightly-coupled digital signal processors (DSPs).
Each MFU
66
,
68
,
70
may have a separate and individual sub-instruction stream, but all the MFUs
66
,
68
,
70
execute synchronously so that the subinstructions lock-step through the pipeline stages.
The GFU may be a processor capable of executing arithmetic logic unit (ALU) operations, reciprocal square, and others. The GFU also may support less common parallel operations such as the parallel reciprocal square root instruction.
The instruction cache
56
may have a 16 Kbyte capacity and include hardware support to maintain coherence, allowing dynamic optimizations through self-modifying code. Software may be used to indicate that the instruction storage is being altered when modifications are made. Coherency may be maintained by hardware that supports write-through, non-allocating caching.
The pipeline control unit
62
may be connected between the instruction buffer
60
and the functional units
66
,
68
,
70
,
72
. The pipeline control unit
62
schedules the transfer of instructions to the functional units
66
,
68
,
70
,
72
. The pipeline control unit
60
also receives status signals from the functional units
66
,
68
,
70
,
72
and a load/store unit
74
and uses the status signals to perform several control functions. The pipeline control unit
62
maintains a scoreboard, generates stalls and bypass controls. The pipeline control unit
62
also may generate traps and maintain special registers.
Each media processing unit
52
,
54
includes a split register file
64
, a single logical register file. The split register file
64
is split into a plurality of register file segments
76
,
78
,
80
,
82
to form a multi-ported structure that is replicated to reduce the integrated circuit die area and to reduce access time. A separate register file segment
76
,
78
,
80
,
82
is allocated to each of the media functional units
66
,
68
,
70
and the general functional unit
70
. In the illustrative embodiment, each register file segment
76
,
78
,
80
,
82
has
128
32-bit registers. The first 96 registers (0-95) in the register file segment
76
,
78
,
80
,
82
are global registers. All the functional
66
,
68
,
70
,
72
units may write to the 96 global registers. The global registers are coherent across all functional units (MFUs and GFU)
66
,
68
,
70
,
72
so that any write operation to a global register by any functional unit is broadcast to all register file segments
76
,
78
,
80
,
82
. Registers
96
-
127
in the register file segments
76
,
78
,
80
,
82
are local registers. Local registers allocated to a functional unit
66
,
68
,
70
,
72
are not accessible or “visible” to other functional units
66
,
68
,
70
,
72
.
The media processing units
52
,
54
are highly structured computation blocks that execute software-scheduled data computation operations with fixed, deterministic and relatively short instruction latencies, operational characteristics yielding simplification in both function and cycle time. The operational characteristics support multiple instruction issue through a very large instruction word (VLIW) approach that avoids hardware interlocks to account for software that does not schedule operations properly. Such hardware interlocks are typically complex, error-prone, and create multiple critical paths. A VLIW instruction word includes one instruction that executes in the general functional unit (GFU)
72
and from zero to three instructions that execute in the media functional units (MFU)
66
,
68
,
70
. A MFU instruction field within the VLIW instruction word may include an operation code (opcode) field, three source register (or immediate) fields, and one destination register field.
Speed and ease of access are often problems encountered when dealing with register files. In order to solve these problems, register files are often split.
FIG. 2
is a schematic block diagram illustrating a split register file
64
. The split register file
64
supplies all operands of processor instructions that execute in the media functional units
66
,
68
,
70
and the (general functional units
72
and receives results of the instruction execution from the execution units. The split register file
64
is the source and destination of store and load operations, respectively.
Large, multiple-ported register files are typically metal-limited so that the register area is proportional with the square of the number of ports. A sixteen port file is roughly proportional in size and speed to a value of 256. The split register file
64
is divided into four register file segments
100
,
102
,
104
, and
106
, each having three read ports and four write ports so that each register file segment has a size and speed proportional to
49
for a total area for the four segments that is proportional to
196
. The total area is therefore potentially smaller and faster than a single central register file. Write operations are fully broadcast so that all files are maintained coherent. Logically, the split register file
64
is no different from a single central register file, however, from the perspective of layout efficiency, the split register file
64
is smaller and has better performance.
Splitting the register file into multiple segments in the split register file
64
in combination with the character of data accesses in which multiple bytes are transferred to the plurality of execution units concurrently, results in a high utilization rate of the data supplied to the integrated circuit chip and effectively leads to a much higher data bandwidth than is supported on normal processors.
Normal applications often fail to exploit the large register file
64
because compilers do not effectively use the large number of registers in the split register file
64
. However, aggressive in-lining techniques that have traditionally been restricted due

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Decoding of a register file does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Decoding of a register file, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Decoding of a register file will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2612749

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.