Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area
Reexamination Certificate
2001-04-03
2004-08-31
Padmanabhan, Mano (Department: 2188)
Electrical computers and digital processing systems: memory
Storage accessing and control
Shared memory area
C711S150000, C711S168000, C711S169000, C365S189040, C365S230050
Reexamination Certificate
active
06785781
ABSTRACT:
PRIOR FOREIGN APPLICATION
This application claims priority from European patent application number 00108699.0, filed Apr. 20, 2000, which is hereby incorporated herein by reference in its entirety.
TECHNICAL FIELD
The present invention relates to improvement of storage devices in computer systems and in particular, it relates to an improved method and system for efficiently accessing multi-port cell array circuitry.
BACKGROUND ART
In modern computer processor architecture development an increasing portion of processor work is still continued to be parallelized. During parallelization an increasing number of processing sub-units should be allowed and be enabled to access one and the same storage location in order to be able to compute as quickly as possible. Thus, such a storage location requires multiple read/write accessibility.
An example is out-of-order processing. Writing data into arrays of such storage locations in parallel from multiple sources, or reading data from arrays in parallel to multiple targets then requires multi-port cells.
The area and performance of such an array is mainly determined by the number of ports per cell and not by the data size to be stored. More precisely, the area consumption of such an array is nearly proportional to the square of the number of ports implemented.
As one storage cell needs m read ports in order to be readable concurrently by a number of m different reading targets and it needs a number of n write ports for n write sources to write in the cell, and each port comprises a pair of a respective data line and select line being orthogonal to each other, the area consumption increases remarkably with increasing m, or n. For example, when in a m=n=1, two ports case a given array has an area consumption of X, and the array should now be replaced by a multiple access array of m=n=4, 8 ports, then, the resulting area consumption is about (8×8)/(2×2)=16 times higher, i.e., 16 ×. Thus, increasing parallelization requires a large additional area consumption on any processor chip.
Although the present invention has a broad field of application as improving or optimizing storage strategies is a very general purpose in computer technology, it will be described and discussed with prior art technology in a special field of application, namely in context of utilizing a so-called instruction window buffer, further abbreviated as IWB, which is usually present in most modern computer systems in order to enable a parallel program processing of instructions by a plurality of processing units. Such processors are referred to herein as out-of-order processors.
In many modern out-of-order processors such a buffer is used to contain all the instructions and/or register contents before the calculated results can be committed and removed from the buffer. When results were calculated speculatively beyond the outcome of a branch instruction, they can be rejected once the branch prediction becomes wrong just by simply cleaning these entries from the buffer and overwriting them with new correct instructions. This is one prerequisite for the out-of-order processing. One main parameter influencing the performance of the processors is the buffer size: A big buffer can contain many more instructions and results and therefore allows more out-of-order processing. One design objective therefore is to have a big buffer. This however stays in conflict with other design requirements such as cycle time, buffer area, etc. When, for example, the buffer size is dimensioned too large then the efforts required to manage such a large plurality of storage locations decreases the performance of the buffer. Furthermore, increased buffer size implies an increased signal propagation delay. Thus, generally, any improved storage method has to find a good compromise between the parameters buffer size, storage management and therewith storage access speed.
The present invention primarily covers the buffer size and the associated signal propagation delay.
A prior art instruction window buffer as it is disclosed in U.S. Pat. No. 5,923,900, “Circular Buffer With N Sequential Real And Virtual Entry Positions For Selectively Inhibiting N Adjacent Entry Positions Including The Virtual Entry Position”, which is hereby incorporated herein by reference in its entirety, is operated according to the following write/read schemes:
With reference to
FIG. 1
(prior art), in order to write a package of instructions as depicted in the upper portion of the figure, for example a package of 4 unresolved instructions uip(
0
:
3
), into an array in one cycle during the dispatch process a cell is needed with as many write ports as the maximum package size, i.e., a number of k
1
=4 in this case.
A write decode block
22
translates the write address in (
0
:
5
) via control line
16
, into input pointer wse
10
. . . wse
13
(
0
:
3
) selecting a block of four entries to be written, namely the array entries i, i+1, i+2, i+3. This is depicted schematically in FIG.
1
. The first instruction uip
0
is written into cell(i) by activating wse
10
on input port di
0
, the next instruction uip
1
is written into cell(i+1) by activating wsel
1
on input port di
1
, and so on, see the filled circles.
This scheme guarantees that the data is written consecutively into the array. As buffer memories in general are often used in a wrap-around way of operation some special care is required to cover this case, too.
The wrap-around case is handled by the write decoder
22
, as well. If for example the window buffer has the total size of
64
entries and a block of four subsequent entries is intended to be written in starting at
62
, then, wse
1
(
0
:
3
) point to entries (
62
,
63
,
0
,
1
).
The read case is similar as revealed from
FIG. 2
which depicts the prior art issue filters if
0
to if
3
controlling an array of 4-read-port cells by read select lines rsel
0
(
0
. . .
63
), rsel
1
(
0
. . .
63
), rsel
2
(
0
. . .
63
), rsel
3
(
0
. . .
63
). The data is read to several data output ports, i.e. Do(
0
:
3
) not explicitly depicted. As many read ports are needed as execution units exists, i.e., instruction execution units (ieu) ieu(
0
:
3
) in order to get full parallelism and provide data for all execution units every cycle for the issue process. A routing network can connect each output port of the buffer with each execution unit. An arbitration logic is provided for connecting a particular port with the desired execution unit.
In particular, the instructions ready for execution are identified by valid bits depicted in the upper line of
FIG. 2
which are passed to the four different issue filters if(
0
:
3
). if
0
selects the oldest of all instructions
0
. . .
63
ready for execution, activates rse
10
and thereby sends the data to the execution units. Filter if
1
ignores the entry detected by if
0
and selects the second oldest, activates rsel
1
and sends it to the execution units, and so on.
Since any entry of the 64 total entries of the buffer can be first, second, third or fourth selected, any entry and therefore any cell needs 4 read ports. This results in an extremely high area consumption and an associated large signal propagation delay.
SUMMARY OF THE INVENTION
It is thus an objective of the present invention to decrease area consumption and thus increase the efficiency of storage area utilization.
This objective of the invention is achieved by the features stated in enclosed independent claims. Further advantageous arrangements and embodiments of the invention are set forth in the respective subclaims.
A considerable amount of area can be saved according to the present invention by reducing the number of write ports to the number k
1
of concurrently intended write accesses and the number of output ports to the number k
2
of concurrently intended read accesses to the array. This remarkable reduction of ports and thus an extraordinary associated area saving can be achieved when the intended array ‘natural’ operation
Leenstra Jens
Pille Juergen
Sautter Rolf
Wendel Dieter
Augspurger, Esq. Lynn L.
Heslin Rothenberg Farley & & Mesiti P.C.
International Business Machines - Corporation
Padmanabhan Mano
Schiller, Esq. Blanche E.
LandOfFree
Read/write alignment scheme for port reduction of multi-port... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Read/write alignment scheme for port reduction of multi-port..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Read/write alignment scheme for port reduction of multi-port... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3358878