Registers and methods for accessing registers for use in a...

Electrical computers and digital processing systems: memory – Storage accessing and control

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C341S100000

Reexamination Certificate

active

06175892

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to methods and apparatus, including, e.g., registers and register arrays, for implementing single instruction multiple data (SIMD) signal processing operations.
BACKGROUND OF THE INVENTION
The processing of two-dimensional sets of data is growing in importance as the use of computers continues to grow. Two-dimensional sets of data are frequently used to represent, e.g., images.
In the digital processing of two-dimensional signals, e.g., data sets, it is possible, for example when performing some two-dimensional filtering such as a low pass filtering operation or some two-dimensional transformation such as an inverse discrete cosine transform (IDCT) operation, to treat a two-dimensional operation as a series of two, one-dimensional operations. This is possible due to a mathematical property called separability. This separability property allows a complex two-dimensional process to be implemented as a series of two, one-dimensional processes.
Sequential one-dimensional processes tend to be far less complicated algorithms to implement, than a corresponding two-dimensional process. For this reason, the property of separability is frequently used to implement two-dimensional data processing operations. In implementing a two-dimensional operation as two, one-dimensional operations, the one-dimensional operations are applied sequentially in the horizontal and vertical directions of the data being processed. This is illustrated in
FIG. 1
where the two-dimensional operation HV is implemented as two sequential processing operations H, V on the data set A
100
to produce the two-dimensional data set HV(A)
104
. The intermediate data set H(A)
102
is produced as the result of the application of the horizontal function H to the data set A
100
.
Suppose that data words, each represented by a separate box, are arranged in a memory in “raster-scan” order as illustrated in FIG.
2
. In such an arrangement, data words beginning at the top left of a two-dimensional data array
200
, following to the right and down to the bottom right data element are stored at sequential locations in memory as illustrated by the row of blocks
202
representing sequential memory locations. In processing the two-dimensional data in the horizontal direction the arrangement of the samples in the one-dimensional structure is convenient because each data sample follows the next. In order to process the data in the vertical direction it is clear from the first two shaded squares in
FIG. 2
that access to the data is not as straightforward because there is a jump between the consecutive samples as represented by the arrow
203
.
One known method of solving the problem of accessing the vertical rows of data for performing the vertical processing operation is to store the results from the horizontal processing operation in transposed order. This is shown in
FIG. 3
wherein the shaded blocks representing a vertical column of data are now arranged horizontally.
As a result of the mathematical transpose accessing the vertical information is simple. At the end of the processing for the vertical direction, the transpose of the resulting data must normally be performed to restore the arrangement to the natural order for use in subsequent operations, e.g., the generation of video images for display.
Another method of accessing data to perform sequential horizontal and vertical data processing operations involves addressing the data that is stored in memory using a pointer that jumps to the next desired data sample. This method has the advantage, as compared to the transpose technique discussed above, that it does not require that the data undergo an additional transposition step in order to restore the natural data ordering for use in subsequent operations.
In high-performance implementations of digital signal processing algorithms, which may include various real time image processing applications, it is good practice to keep data that is being processed in hardware registers close to the main computational unit in order to minimize processing delays due to data transfer operations. The computational unit may be, e.g., a programmable signal processing core or some fixed function hardware. As a result of the “closeness” of the data registers to the computational unit, the computational unit can operate directly on the registers.
In cases where the data is not located in registers coupled closely to the computational unit, the data has to be fetched from cache or other memory and this results in reduced system performance. By keeping data which is frequently used in data registers which are directly accessible to a computational unit, a high level of computational speed can be maintained throughout the lifetime of a computation without having the computational unit stall due to data being in lower speed storage such as a cache or main memory.
Single-Instruction Multiple Data (SIMD) architecture systems allow multiple data elements to be processed simultaneously in response to a single instruction. The multiple data units may be stored in a single register. Well designed SIMD architectures can provide considerable performance advantages over more traditional Single-Instruction Single Data (SISD) architecture systems because of the simultaneous processing of multiple pieces of data made possible by the SIMD architecture. MMX technology from Intel Corporation currently in use in computer CPUs is one example of a SIMD architecture.
Unfortunately the above described techniques of performing sequential horizontal and vertical processing operations are not straightforward when the data is stored in registers in a format that is used by SIMD architectures. In such a situation, the manipulations that are required to obtain the desired data arrangement are relatively difficult to implement.
Consider for example, a SIMD architecture that operates on two data samples at the same time. In such a SIMD architecture the data samples have to be presented to the processing unit in the arrangement shown in the diagram of FIG.
4
A. Here, one word
400
that is n-bits in length, contains two sub-words
402
,
404
, each n/2-bits in length. Even though one n-bit word
400
is presented to the processor, there are actually two pieces of data, sub-words b, a,
402
,
404
that are embedded in that word
400
. When presented to the SIMD processing unit, each of these halves is handled separately. This is one of the primary features of the SIMD processing.
As an example of a SIMD processing operation, suppose that it is desired to add two sets of numbers, {a, b} and {c, d} to produce {a+c} and {b+d}. In the SIMD architecture, it is possible to set up two data elements
406
,
408
similar to the one shown in FIG.
4
A. One of these
406
would contain the set {a, b} and the other
408
would contain the set {c, d}. They may be presented to the SIMD processing unit for the desired addition. The processing unit treats the two halves of the input data words as independent quantities during the computation. An important consequence of this is that if the addition for the lower half overflows, the overflow will not affect the upper half. It can be seen from this example that the SIMD architecture is extremely beneficial for processing multiple pieces of data in parallel.
The inventors of the present application have discovered that various problems are encountered when one attempts to implement two-dimensional signal processing algorithms on SIMD architecture using local registers to provide high-performance signal processing implementations. For example, when processing two-dimensional signals, the SIMD architecture poses the following problem when data is to be transposed. Suppose that it is desired to obtain the transpose of the matrix:
&AutoLeftMatch;
[
a
b
c
d
]
where the data is arranged in registers
0
and
1
as shown in FIG.
5
. Note that the little-endian data scheme is used for the examples in this application, howe

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Registers and methods for accessing registers for use in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Registers and methods for accessing registers for use in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Registers and methods for accessing registers for use in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2489854

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.