Electrical computers and digital processing systems: memory – Address formation – Combining two or more values to create address
Reexamination Certificate
2002-03-07
2003-10-28
Peikari, B. James (Department: 2186)
Electrical computers and digital processing systems: memory
Address formation
Combining two or more values to create address
C711S216000, C711S221000, C711S005000, C711S211000, C711S202000, C711S205000, C711S207000, C711S209000
Reexamination Certificate
active
06640296
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to data processing in a device having a plurality of memory modules, which are accessed in parallel in order to retrieve and store data elements in the memory modules, and, more particularly, to a power-of-two stride access, and a dynamic storage scheme for such a device.
BACKGROUND OF THE INVENTION
In hardware architectures containing parallel processing resources, several operands from the memory need to be transferred simultaneously for matching the processing power with the memory bandwidth. Multiple operand access can be realized by allocating an independent memory module for each operand access. With this approach, however, problems arise from the fact that the parallel data accesses are not always linear. That introduces access conflicts in that the data operands to be accessed are in the same memory modules and, therefore, they cannot be accessed simultaneously.
In digital signal processing (DSP), data elements are often accessed by taking every Kth element from the data array of N elements, wherein N=2
n
, K=2
k
with n=1, 2, . . . and k=0, 1, 2, . . . , n−1. In particular, every Kth element is taken but when the access exceeds N−1, the access pattern wraps back to the beginning. The elements are accessed in the order as shown below, with p being the new address and i being the linear ordering index:
p
=mod(
Ki,N
)+└
Ki/N┘, i
=0, 1, . . . , N−1
where └z┘ denotes the floor function of z, and is the greatest integer less than or equal to z; mod is a modulus operation. With such access ordering, an N-point array (x
1
, X
1
, X
2
, . . . X
N−1
) will be accessed as (x
0
, X
K
, X
2K
, . . . , X
N−K
, X
1
, X
K+1
, X
2K+1
, . . . ). Such an access pattern is related to stride by K permutation. Here K is a number defined by a power of two, and thus the access is referred to as a power-of-two stride access. Such interconnection topologies are present, for example, in discrete trigonometric transforms and trellis coding. A mapping method to enhance parallel power-of-two stride access is disclosed in Melton et al. (EP 0 313 788 B1). When an N-element data block has N memory locations distributed over Q memory modules, Q parallel accesses of every Kth data element can be carried out such that N=2
n
, Q=2
q
, K=2
k
with n=1, 2, . . . , q=1, 2, . . . , n−1 and k=0, 1, 2, . . . , n−1.
In the past, access conflicts were avoided by using a technique called double buffering, where 2N memory locations are allocated for an N-point data array. With double buffering, while the data at one location is being written, the data at another location is read. This approach has been used in Viterbi-decoding in some DSP processors. The major drawback for this approach is that it increases the size of memory and increases the latency as well. Memory accesses can also be hard-coded into the software code. This method increases the code size. Alternatively, look-up tables can be used. However, look-up tables also increase the memory consumption and the number of memory accesses. Furthermore, multi-port memories can be used, but they are expensive in terms of area.
Specialized address generation units have been proposed, but they support only one specific stride-by−2
k
(a single access pattern) along with a stride-by−1, which is the linear access, or they are limited to certain data array sizes. Harper (U.S. Pat. No. 4,918,600) discloses a method of stride access wherein only two strides 1 and 2
k
are supported at a time. However, Harper does not support stride-by-1 and stride-by-N/2 when more that 2 memories are used, i.e., 4, 8, etc. In such a case, we need to access elements indices (0, N/2, 1, N/2+1, 2, . . . ). Especially parallel access of N/2 and 1 presents the conflict. The specialized address generation units may also support a certain number of parallel memories, as disclosed in Cohen (IEEE Trans Acoustics, Speech, and Signal Processing, December 1976, pp.577-579).
Currently there are also parallel access schemes, which require an odd number of memory modules, and the number of memory modules being greater than the number of operands accessed simultaneously, as disclosed in D. T. Harper, “Block, multistride vector, and FFT accesses in parallel memory systems,” (IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 1, pp. 43-51, Jan. 1991). With these schemes, the memory consumption is also high and the memory is fragmented.
Thus, it is advantageous and desirable to provide a method and device for efficiently generating the addresses for Q parallel memory accesses according to stride permutation.
SUMMARY OF THE INVENTION
It is a primary object of the invention to provide a method of conflict-free access to parallel memory modules in a data processing device.
It is another object of the invention to provide an apparatus and system for mapping address locations to physical memory locations in parallel memory modules for conflict-free access for several strides.
These objects can be achieved by pre-selecting a storage scheme for each data block based on the number of elements in that data block and number of memory modules.
Thus, according to the first aspect of the present invention, there is provided a method of accessing data elements of an N-element data block on N memory locations distributed over Q memory modules via Q parallel accesses, wherein
said N memory locations are addressable in an n-bit index address, and
said Q memory modules are addressable in a q-bit module address and an (n−q) bit row address. The method is characterized by:
selecting (n−q) bits from the index address for providing the row address;
grouping the index address into a plurality of fields comprising a number of first fields, wherein each of the first fields contains q bits; and
performing a logical function operation on bits obtained from corresponding positions in said plurality of fields for providing the module address for accessing the data element in one of said Q memory modules.
According to the present invention, said plurality of fields may comprise a second field, wherein the second field contains e bits, e being a positive integer smaller than q, and e depends on the relationship between n and q.
Preferably, the method is further characterized by performing a rotation operation on one of said plurality of fields for providing a third field, wherein said logical function is also performed on the third field in addition to said plurality of fields.
Preferably, the method is further characterized by concatenating (q−e) zeros to the second field.
According to the present invention, the method is further characterized by performing a rotation operation on one of said plurality of fields for providing a third field, wherein the one field on which the rotation operation is performed contains the least significant bits, and wherein said logical function is also performed on the third field in addition to said plurality of fields.
Preferably, said logical function is a bitwise exclusive-OR operation.
According to the present invention the n-bit index address can be arranged into an arbitrary order. In addition, the q-bit memory module address can also be reordered into arbitrary order.
According to the second aspect of the present invention, there is provided an apparatus for accessing a data element of an N-element data block on N memory locations distributed over Q memory modules via Q parallel accesses, wherein
said N memory locations are addressable in an n-bit index address, and
said Q memory modules are addressable in a q-bit module address and an (n−q) bit row address. The apparatus is characterized by:
means (
12
), operatively connected to the index address, for selecting (n−q) bits from the index address for providing the row address; and
means (
40
), operatively connected to the index address, for grouping the index addre
Nokia Corporation
Peikari B. James
Ware Fressola Van Der Sluys & Adolphson LLP
LandOfFree
Data processing method and device for parallel stride access does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Data processing method and device for parallel stride access, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data processing method and device for parallel stride access will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3160132