Apparatus and method for performing intra-add operation

Electrical computers and digital processing systems: processing – Processing control – Arithmetic operation instruction processing

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Apparatus and method for performing intra-add operation Apparatus and method for performing intra-add operation

: 1998-03-31
: 2002-07-09
: Eng, David Y. (Department: 2155)
: Electrical computers and digital processing systems: processing
: Processing control
: Arithmetic operation instruction processing

: C712S222000
: Reexamination Certificate
: active
: 06418529
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates in general to the field of computer systems, and in particular, to an apparatus and method for performing multi-dimensional computations based on an intra-add operation.
2. Description of the Related Art
To improve the efficiency of multimedia applications, as well as other applications with similar characteristics, a Single Instruction, Multiple Data (SIMD) architecture has been implemented in computer systems to enable one instruction to operate on several operands simultaneously, rather than on a single operand. In particular, SIMD architectures take advantage of packing many data elements within one register or memory location. With parallel hardware execution, multiple operations can be performed on separate data elements with one instruction, resulting in significant performance improvement.
Currently, the SIMD addition operation only performs “vertical”or inter-register addition, where pairs of data elements, for example, a first element Xn (where n is an integer) from one operand, and a second element Yn from a second operand, are added together. An example of such a vertical addition operation is shown in
FIG. 1
, where the instruction is performed on the sets of data elements (X
3
, X
2
, X
1
and X
0
) and (Y
3
, Y
2
, Y
1
, and Y
0
) accessed as Source
1
and Source
2
, respectively to obtain the result (X
3
+Y
3
, X
2
+Y
2
, X
1
+Y
1
, and X
0
+Y
0
).
Although many applications currently in use can take advantage of such a vertical add operation, there are a number of important applications which would require the rearrangement of the data elements before the vertical add operation can be implemented so as to provide realization of the application.
For example, a matrix multiplication operation is shown below.
MATRIX
⁢

⁢
A
*
VECTOR
⁢

⁢
χ
=
VECTOR
⁢

⁢
Υ
&LeftBracketingBar;
A
14
A
13
A
12
A
11
A
24
A
23
A
22
A
21
A
34
A
33
A
32
A
31
A
44
A
43
A
42
A
41
&RightBracketingBar;
×
&LeftBracketingBar;
χ
4
χ
3
χ
2
χ
1
&RightBracketingBar;
=
&LeftBracketingBar;
A
14
⁢
χ
4
+
A
13
⁢
χ
3
+
A
12
⁢
χ
2
+
A
11
⁢
χ
1
A
24
⁢
χ
4
+
A
23
⁢
χ
3
+
A
22
⁢
χ
2
+
A
21
⁢
χ
1
A
34
⁢
χ
4
+
A
33
⁢
χ
3
+
A
32
⁢
χ
2
+
A
31
⁢
χ
1
A
44
⁢
χ
4
+
A
43
⁢
χ
3
+
A
42
⁢
χ
2
+
A
41
⁢
χ
1
&RightBracketingBar;
To obtain the product of the matrix A with a vector X to obtain the resulting vector Y, instructions are used to: 1) store the columns of the matrix A as packed operands (this typically requires rearrangement of data because the rows of the matrix A coefficients are stored to be accessed as packed data operands, not the columns); 2) store a set of operands that each have a different one of the vector X coefficients in every data element; 3) use vertical multiplication where each data element in the vector X (i.e., X
4
, X
3
, X
2
, X
1
) has to be first multiplied with data elements in each column (for example, [A
14
, A
24
, A
34
, A
44
]) of the matrix A. The results of the multiplication operations are then added together through three vertical add operations such as that shown in
FIG. 1
, to obtain the final result. Such a matrix multiplication operation based on the use of vertical add operations typically requires 20 instructions to implement, an example of which is shown below in Table 1.
Exemplary Code Based on Vertical-Add Operations
Assumptions:
1/X stored with X
1
first, X
4
last
2/transposed of A stored with A
11
first, A
21
second, A
31
third, etc.
3/availability of:
DUPLS: duplicate once
DUPLD: duplicate twice
TABLE 1
MOVD
mm0, <mem_X>
// [0, 0, 0, X1]
DUPLS
mm0, mm0
// [0, 0, X1, X1]
DUPLD
mm0, mm0
// [X1, X1, X1, X1]
PFMUL
mm0, <mem_A>
// [A41*X1, A31*X1, A21*X1, A11*X1]
MOVD
mm1, <mem_X + 4>
// [0, 0, 0, X2]
DUPLS
mm1, mm1
// [0, 0, X2, X2]
DUPLD
mm1, mm1
// [X2, X2, X2, X2]
PFMUL
mm1, <mem_A + 16>
// [A42*X2, A32*X2, A22*X2, A12*X2]
MOVD
mm2, <mem_X + 8>
// [0, 0, 0, X3]
DUPLS
mm2, mm2
// [0, 0, X3, X3]
DUPLD
mm2, mm2
// [X3, X3, X3, X3]
PFMUL
mm2, <mem_A + 32>
// [A43*X3, A33*X3, A23*X3, A13*X3]
MOVD
mm3, <mem_X + 12>
// [0, 0, 0, X4]
DUPLS
mm3, mm3
// [0, 0, X4, X4]
DUPLD
mm3, mm3
// [X4, X4, X4, X4]
PFMUL
mm3, <mem_A + 48>
// [A44*X4, A34*X4, A24*X4, A14*X4]
PFADD
mm0, mm1
// [A42*X2 + A41*X1, A32*X2 + A31*X1,
// A22*X2 + A21*X1, A12*X2 + A11*X1]
PFADD
mm2, mm3
// [A44*X4 + A43*X3, A34*X4 + A33*X3,
// A24*X4 + A23*X3, A14*X4 + A13*X3]
PFADD
mm0, mm2
// [A44*X4 + A43*X3 + A42*X2 + A41*X1,
// A34*X4 + A33*X3 + A32*X2 + A31*X1,
// A24*X4 + A23*X3 + A22*X2 + A21*X1,
// A14*X4 + A13*X3 + A12*X2 + A11*X1]
MOVDQ
<mem_Y>, mm0
// store [Y4, Y3, Y2, Y1]
Accordingly, there is a need in the technology for providing an apparatus and method which efficiently performs multi-dimensional computations based on a “horizontal”or intra-add operation. There is also a need in the technology for a method and operation for increasing code density by eliminating the need for the rearrangement of data elements and the corresponding rearrangement operations.
BRIEF SUMMARY OF THE INVENTION
A method and apparatus for including in a processor instructions for performing intra-add operations on packed data is described. In one embodiment, an execution unit is coupled to a storage area. The storage area has stored therein a first and a second packed data operands. The execution unit performs operations on data elements in the first and the second packed data operands to generate a plurality of data elements in a packed data result in response to receiving a single instruction. At least two of the plurality of data elements in the packed data result store the result of an intra-add operation upon the first and the second packed data operands.

REFERENCES:
patent: 3711692 (1973-01-01), Batcher
patent: 3723715 (1973-03-01), Chen et al.
patent: 4161784 (1979-07-01), Cushing et al.
patent: 4189716 (1980-02-01), Krambeck
patent: 4393468 (1983-07-01), New
patent: 4418383 (1983-11-01), Doyle et al.
patent: 4498177 (1985-02-01), Larson
patent: 4630192 (1986-12-01), Wassel et al.
patent: 4707800 (1987-11-01), Montrone et al.
patent: 4771379 (1988-09-01), Ando et al.
patent: 4785393 (1988-11-01), Chu et al.
patent: 4785421 (1988-11-01), Takahashi et al.
patent: 4901270 (1990-02-01), Galbi et al.
patent: 4989168 (1991-01-01), Kuroda et al.
patent: 5095457 (1992-03-01), Jeong
patent: 5187679 (1993-02-01), Vassiliadis et al.
patent: 5201056 (1993-04-01), Daniel et al.
patent: 5327369 (1994-07-01), Ashkenazi
patent: 5339447 (1994-08-01), Balmer
patent: 5390135 (1995-02-01), Lee et al.
patent: 5418736 (1995-05-01), Widigen et al.
patent: 5442799 (1995-08-01), Murakami et al.
patent: 5448703 (1995-09-01), Amini et al.
patent: 5517626 (1996-05-01), Archer et al.
patent: 5530661 (1996-06-01), Garbe et al.
patent: 5537601 (1996-07-01), Kimura et al.
patent: 5586070 (1996-12-01), Purcell
patent: 5677862 (1997-10-01), Peleg et al.
patent: 5678009 (1997-10-01), Bains et al.
patent: 5721697 (1998-02-01), Lee
patent: 5721892 (1998-02-01), Peleg et al.
patent: 5815421 (1998-09-01), Dulong et al.
patent: 5819117 (1998-10-01), Hansen
patent: 5822232 (1998-10-01), Dulong et al.
patent: 5859790 (1999-01-01), Sidwell
patent: 5862067 (1999-01-01), Mennemeier et al.
patent: 5875355 (1999-02-01), Mackenzie et al.
patent: 5880984 (1999-03-01), Burchfiel et al.
patent: 5880985 (1999-03-01), Makineni et al.
patent: 5883824 (1999-03-01), Lee et al.
patent: 5887186 (1999-03-01), Nakanishi

Affiliated with

Roussel Patrice

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Blakely , Sokoloff, Taylor & Zafman LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Eng David Y.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Intel Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method for performing intra-add operation does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for performing intra-add operation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for performing intra-add operation will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2844973

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure