Electrical computers and digital processing systems: processing – Processing architecture – Vector processor
Reexamination Certificate
1999-03-05
2001-07-24
An, Meng-Ai T. (Department: 2783)
Electrical computers and digital processing systems: processing
Processing architecture
Vector processor
C712S002000, C712S004000, C712S020000
Reexamination Certificate
active
06266758
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to the field of single instruction multiple data vector (SIMD) processing. More particularly, the present claimed invention relates to alignment and ordering vector elements for SIMD processing.
BACKGROUND ART
Today, most processors in microcomputer systems provide a 64-bit wide datapath architecture. The 64-bit datapath allows operations such as read, write, add, subtract, and multiply on the entire 64 bits of data at once. However, for many applications the types of data involved simply do not require the full 64 bits. In media signal processing (MDMX) applications, for example, the light and sound values are usually represented in 8, 12, 16, or 24 bit numbers. This is because people typically are not able to distinguish the levels of light and sound beyond the levels represented by these numbers of bits. Hence, data types in MDMX applications typically require less than the full 64 bits provided in the datapath in most computer systems.
To efficiently utilize the entire datapath, the current generation of processors typically utilizes a single instruction multiple data (SIMD) method. According to this method, a multitude of smaller numbers are packed into the 64 bit doubleword as elements, each of which is then operated on independently and in parallel. Prior Art
FIG. 1
illustrates an exemplary single instruction multiple data (SIMD) method. Registers, vs and vt, in a processor are of 64-bit width. Each register is packed with four 16-bit data elements fetched from memory: register vs contains vs[0], vs[1], vs[2], and vs[3] and register vt contains vt[0], vt[1], vt[2], and vt[3]. The registers in essence contain a vector of N elements. To add elements of matching index, an add instruction adds, independently, each of the element pairs of matching index from vs and vt. A third register, vd, of 64-bit width may be used to store the result. For example, vs[0] is added to vt[0] and its result is stored into vd[0]. Similarly, vd[1], vd[2], and vd[3] store the sum of vs and vd elements of corresponding indexes. Hence, a single add operation on the 64-bit vector results in 4 simultaneous additions on each of the 16-bit elements. On the other hand, if 8-bit elements were packed into the registers, one add operation performs 8 independent additions in parallel. Consequently, when a SIMD arithmetic instruction such as addition, subtraction, or multiply, is performed on the data in the 64-bit datapath, the operation actually performs multiple numbers of operations independently and in parallel on each of the smaller elements comprising the 64 bit datapath. In SIMD vector operation, processors typically require alignment to the data type size of 64-bit doubleword on a load. This alignment ensures that the SIMD vector operations occur on aligned boundaries of a 64-bit doubleword boundary.
Unfortunately, the elements within application data vectors are frequently not 64-bit doubleword aligned for SIMD operations. For example, data elements stored in a memory unit are loaded into registers in a chunk such as a 64-bit doubleword format. To operate on the individual elements, the elements are loaded into a register. The order of the elements in the register remain the same as the order in the original memory. Accordingly, the elements may not be properly aligned for a SIMD operation.
Traditionally, when elements are not aligned with a proper boundary as required for a SIMD vector operation, the non-aligned vector processing have typically been reduced to scalar processing. That is, operations took place one element at a time instead of simultaneous multiple operations. Consequently, SIMD vector operations lost parallelism and performance advantages when the vector elements were not properly aligned.
Furthermore, many media applications require a specific ordering for the elements within a SIMD vector. Since elements necessary for SIMD processing are commonly stored in multiple 64-bit doublewords with other elements, these elements need to be selected and assembled into a vector of desired order. For example, multiple channel data are commonly stored in separate arrays or interleaved in a single array. Processing the data requires interleaving or deinterleaving the multiple channels. Other applications require SIMD vector operations on transposed 2 dimensional arrays of data. Yet other applications reverse the order of elements in an array as in FFTs, DCTs, and convolution algorithms.
Thus, what is needed is a method for aligning and ordering elements r more efficient SIMD vector operations by providing computational parallelism.
SUMMARY OF THE INVENTION
The present invention provides alignment and ordering of vector elements for SIMD processing. The present invention is implemented in a computer system including a processor having a plurality of registers. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements is selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
REFERENCES:
patent: 4128880 (1978-12-01), Cray, Jr.
patent: 4773006 (1988-09-01), Kinoshita et al.
patent: 5150290 (1992-09-01), Hunt
patent: 5418915 (1995-05-01), Matuda et al.
patent: 5513366 (1996-04-01), Argarwal
patent: 5581773 (1996-12-01), Glover
patent: 5590345 (1996-12-01), Barker
patent: 5666298 (1997-09-01), Peleg et al.
patent: 5669010 (1997-09-01), Duluk, Jr.
patent: 5721892 (1998-02-01), Peleg et al.
patent: 5734874 (1998-03-01), Van Hook
patent: 5740340 (1998-04-01), Purcell
patent: 5752071 (1998-05-01), Tubbs et al.
patent: 5758176 (1998-05-01), Agarwal et al.
patent: 5761523 (1998-06-01), Wilkinson
patent: 5812147 (1998-09-01), Van Hook
patent: 5815723 (1998-09-01), Wilkinson
patent: 5864703 (1999-01-01), Van Hook et al.
patent: 5881307 (1999-03-01), Park et al.
patent: 5922066 (1999-07-01), Cho
patent: 5933650 (1999-08-01), van Hook
patent: 5936872 (1999-08-01), Fischer et al.
patent: 5960012 (1999-09-01), Spracklen
patent: 5996056 (1999-11-01), Volkonsky
patent: 6006316 (1999-12-01), Dinkjian
patent: 6058465 (2000-05-01), Nguyen
J. Eyre et al., “Infineon's TriCore Tackles DSP-Superscalar Hybrid Competes with Other Hybrids, DSPs, ”Microdesign Resources Microprocessor Report, Apr. 19, 1999, pp. 12-14.
T. Halfhill et al., “Mips vs. Lexra: Definitely Not Aligned-Patent Lawsuit Hinges on Unusual Instructions in MIPS Architecture,”Microdesign Resources Microprocessor Report, Dec. 6, 1999, pp. 14-17 and 19.
Slides entitled, “88410 Second Level Cache,”Microprocessor Forum, Nov. 1991 (as described in the article entitled, “Organization of the Motorola 88110: A Superscalar RISC Microprocessor.”).
Craig Hansen, “Architecture of a Broadband Mediaprocessor,” COMPCON 96, Feb. 25-29, 1996, pp. 1-8.
J. Turley et al., “TI's New C6x DSP Screams at 1,600 MIPS-Radical Design Offers 8-Way Superscalar Execution, 200-MHz Clock Speed,”Microdesign Resources Microprocessor Report, Feb. 17, 1997, pp. 14-17.
Jeff
Hsu Perter
Huffman William A.
Killian Earl A.
Moreton Henry P.
Van Hook Timothy J.
An Meng-Ai T.
MIPS Technologies Inc.
Sterne Kessler Goldstein & Fox P.L.L.C.
Whitmore Stacy
LandOfFree
Alignment and ordering of vector elements for single... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Alignment and ordering of vector elements for single..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Alignment and ordering of vector elements for single... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2464009