Electrical computers and digital processing systems: processing – Processing architecture – Vector processor
Reexamination Certificate
1998-03-31
2001-04-03
Treat, William M. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing architecture
Vector processor
C712S222000, C708S603000, C708S607000, C345S522000
Reexamination Certificate
active
06212618
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates in general to the field of computer systems, and in particular, to an apparatus and method for performing multi-dimensional computations based on an intra-add operation.
2. Description of the Related Art
To improve the efficiency of multimedia applications, as well as other applications with similar characteristics, a Single Instruction, Multiple Data (SIMD) architecture has been implemented in computer systems to enable one instruction to operate on several operands simultaneously, rather than on a single operand. In particular, SIMD architectures take advantage of packing many data elements within one register or memory location. With parallel hardware execution, multiple operations can be performed on separate data elements with one instruction, resulting in significant performance improvement.
Currently, the SIMD addition operation only performs “vertical” or inter-register addition, where pairs of data elements, for example, a first element Xn (where n is an integer) from one operand, and a second element Yn from a second operand, are added together. An example of such a vertical addition operation is shown in
FIG. 1
, where the instruction is performed on the sets of data elements (X
3
, X
2
, X
1
and X
0
) and (Y
3
, Y
2
, Y
1
, and Y
0
) accessed as Source1 and Source2, respectively to obtain the result (X
3
+Y
3
, X
2
+Y
2
, X
1
+Y
1
, and X
0
+Y
0
).
Although many applications currently in use can take advantage of such a vertical add operation, there are a number of important applications which would require the rearrangement of the data elements before the vertical add operation can be implemented so as to provide realization of the application.
For example, a matrix multiplication operation is shown below.
MATRIX
⁢
⁢
A
⁢
*
⁢
VECTOR
⁢
⁢
X
=
VECTOR
⁢
⁢
Y
(
A
14
A
13
A
12
A
11
A
24
A
23
A
22
A
21
A
34
A
33
A
32
A
31
A
44
A
43
A
42
A
41
)
×
(
X
4
X
3
X
2
X
1
)
=
(
A
14
⁢
X
4
+
A
13
⁢
X
3
+
A
12
⁢
X
2
+
A
11
⁢
X
1
A
24
⁢
X
4
+
A
23
⁢
X
3
+
A
22
⁢
X
2
+
A
21
⁢
X
1
A
34
⁢
X
4
+
A
33
⁢
X
3
+
A
32
⁢
X
2
+
A
31
⁢
X
1
A
44
⁢
X
4
+
A
43
⁢
X
3
+
A
42
⁢
X
2
+
A
41
⁢
X
1
)
To obtain the product of the matrix A with a vector X to obtain the resulting vector Y, instructions are used to: 1) store the columns of the matrix A as packed operands (this typically requires rearrangement of data because the rows of the matrix A coefficients are stored to be accessed as packed data operands, not the columns); 2) store a set of operands that each have a different one of the vector X coefficients in every data element; 3) use vertical multiplication where each data element in the vector X (i.e., X
4
, X
3
, X
2
, X
1
) has to be first multiplied with data elements in each column (for example, [A
14
, A
24
, A
34
, A
44
]) of the matrix A. The results of the multiplication operations are then added together through three vertical add operations such as that shown in
FIG. 1
, to obtain the final result. Such a matrix multiplication operation based on the use of vertical add operations typically requires 20 instructions to implement, an example of which is shown below in Table 1.
Exemplary Code Based on Vertical-Add Operations:
Assumptions:
TABLE 1
1/X stored With X1 first, X4 last
2/transposed of A stored with A11 first, A21 second, A31 third, etc.
3/availability of:
DUPLS: duplicate once
DUPLD: duplicate twice
MOVD
mm0,
/ /[0,0,0,X1]
<mem_X>
DUPLS
mm0, mm0
/ /[0,0,X1,X1]
DUPLD
mm0, mm0
/ /[X1,X1,X1,X1]
PFMUL
mm0,
/ /[A41*X1,A31*X1,A21*X1,A11*X1]
<mem_A>
MOVD
mm1,
/ /[0,0,0,X2]
<mem_X+4>
DUPLS
mm1, mm1
/ /[0,0,X2,X2]
DUPLD
mm1, mm1
/ /[X2,X2,X2,X2]
PFMUL
mm1,
/ /[A42*X2,A32*X2,A22*X2,A12*X2]
<mem_A+16>
MOVD
mm2,
/ /[0,0,0,X3]
<mem_X+8>
DUPLS
mm2, mm2
/ /[0,0,X3,X3]
DUPLD
mm2, mm2
/ /[X3,X3,X3,X3]
PFMUL
mm2,
/ /[A43*X3,A33*X3,A23*X3,A13*X3]
<mem_A+32>
MOVD
mm3,
/ /[0,0,0,X4]
<mem_X+12>
DUPLS
mm3, mm3
/ /[0,0,X4,X4]
DUPLD
mm3, mm3
/ /[X4,X4,X4,X4]
PFMUL
mm3,
/ /[A44*X4,A34*X4,A24*X4,A14*X4]
<mem_A+48>
PFADD
mm0, mm1
/ /[A42*X2+A41*X1,A32*X2+A31*X1,
/ /A22*X2+A21*X1,A12*X2+A11*X1]
PFADD
mm2, mm3
/ /[A44*X4+A43*X3,A34*X4+A33*X3,
/ /A24*X4+A23*X3,A14*X4+A13*X3]
PFADD
mm0, mm2
/ /[A44*X4+A43*X3+A42*X2+A41*X1,
/ /A34*X4+A33*X3+A32*X2+A31*X1,
/ /A24*X4+A23*X3+A22*X2+A21*X1,
/ /A14*X4+A13*X3+A12*X2+A11*X1]
MOVDQ
<mem_Y>, mm0
/ /store[Y4,Y3,Y2,Y1]
Accordingly, there is a need in the technology for providing an apparatus and method which efficiently performs multi-dimensional computations based on a “horizontal” or intra-add operation. There is also a need in the technology for a method and operation for increasing code density by eliminating the need for the rearrangement of data elements and the corresponding rearrangement operations.
BRIEF SUMMARY OF THE INVENTION
A method and apparatus for including in a processor, instructions for performing multiply-intra-add operations on packed data is described. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first and a second packed data. The processor performs operations on data elements in the first packed data and the second packed data to generate a plurality of data elements in a third packed data in response to receiving an instruction. At least two of the plurality of data elements in the third packed data store the result of multiply-intra-add operations.
REFERENCES:
patent: 5859790 (1999-01-01), Sidwell
patent: 5875355 (1999-02-01), Sidwell et al.
patent: 5880984 (1999-03-01), Burchfiel et al.
patent: 5887186 (1999-03-01), Nakanishi
patent: 5901301 (1999-05-01), Matsuo et al.
patent: 5918062 (1999-06-01), Oberman et al.
patent: 5983257 (1999-11-01), Dulong et al.
Visual Instruction Set (VIS#) User's Guide, Sun Microsystems, Version 1.1, Mar. 1997.
AMD-3D Technology Manual, AMD, Publication No. 21928, Issued Date: Feb. 1998.
MIPS Extension for Digital Media with 3D, MIPS Technology, Inc., Mar. 12, 1997, pp 0-26.
A Procesor Architecture for 3D Graphics Calculations, Yulun Wang, Amante Manager, Partha Srinivasan, Computer Motion, Inc., pp 1-23.
Parallel Computers for Graphics Applications (Proceedings: Second International Conference . . . ), Levinthal, et al., 1987, pp 193-198.
A SIMD Graphics Processor, Adam Levinthal, Thomas Porter, 1984, pp 77-82.
Architecture of a Broadband Mediaprocessor (Proceedings of COMPCON '96), Craig Hansen, 1996, pp 334-354.
64-bit and Multimedia Extensions in the PA-RISC 2.0 Architecture, Computing Directory Technologies Precision Architecture Document, Jul. 17, 1997.
Silicon Graphics Introduces Enchanced MIPS Architecture to Lead the Interactive Digital Revolution, Oct. 21, 1996.
21164 Alpha Microprocessor Data Sheet, Samsung Electronics, 1997.
TM100-Preliminary Data Book, Philips Semiconductors, Jul. 1, 1997, pp A-74, A133-138, A161.
Blakely , Sokoloff, Taylor & Zafman LLP
Intel Corporation
Treat William M.
LandOfFree
Apparatus and method for performing multi-dimensional... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for performing multi-dimensional..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for performing multi-dimensional... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2489203