Electrical computers and digital processing systems: processing – Processing control – Arithmetic operation instruction processing
Reexamination Certificate
2002-06-06
2004-02-03
Coleman, Eric (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Arithmetic operation instruction processing
C212S227000, C212S246000, C708S490000, C708S495000, C708S501000, C708S523000, C708S603000, C708S625000
Reexamination Certificate
active
06687810
ABSTRACT:
FIELD
The present invention relates to the field of computer systems. Specifically, the present invention relates to a method and apparatus for staggering execution of an instruction.
BACKGROUND
Multimedia applications such as 2D/3D graphics, image processing, video compression/decompression, voice recognition algorithms and audio manipulation, require performing the same operation on a large number of data items (referred to as “data parallelism”) which may be represented in a small number of bits. For example, graphical and sound data are typically represented by 32-bits in floating point format and 8 or 16 bits in integer format. Floating point numbers are represented in a computer system in the form of a digit string including three components: a sign, an exponent (indicating the magnitude of the number) and a significand or mantissa (indicating the value of the fractional portion of the number). Each type of multimedia application implements one or more algorithms, where each algorithm may require a number of floating point or integer operations, such as ADD or MULTIPLY (hereafter MUL).
Single Instruction Multiple Data (SIMD) technology has enabled a significant improvement in multimedia application performance. SIMD technology provides for a single macro instruction, the execution of which causes a processor to perform the same operation on multiple data items in parallel. This technology is especially suited to systems that provide packed data formats. A packed data format is one in which the bits in a register are logically divided into a number of fixed-sized data elements, each of which represents a separate value. For example, a 64-bit register may be broken into four 16-bit elements, each of which represents a separate 16-bit value. SIMD instructions then separately manipulate each element in these packed data types in parallel. For example, a SIMD packed ADD instruction adds together corresponding data elements from a first packed data operand and a second packed data operand, as illustrated in FIG. 
1
. More specifically, the corresponding data elements for X and Y are added to result in Z, i.e. X
0
+Y
0
=Z
0
, X
1
+Y
1
=Z
1
, X
2
+Y
2
=Z
2 
and X
3
+Y
3
=Z
3
.
FIGS. 2A-2B
 illustrate a current processor implementation of an arithmetic logic unit (ALU) that can be used to execute SIMD instructions. The ALU of 
FIG. 2A
 includes the circuitry necessary to perform operations on the full width of the operands (i.e. all of the data elements). 
FIG. 2A
 also shows that the ALU contains two different types of execution units for respectively performing different types of operations (e.g. certain ALUs use separate units for performing ADD and MUL operations). The four ADD execution units and four MUL execution units are respectively capable of operating as four separate ADD execution units and four separate MUL execution units. Alternatively, the ALU may contain multiple Floating Point Multiply Accumulate (FMAC) units, each capable of performing more than a single type of operation. The following examples assume the use of ADD and MUL execution units, but other execution units such as FMAC may also be used.
Thus, as illustrated in 
FIG. 2B
, if at time T, an “ADD X, Y” instruction is issued via issue port 
105
, each of the four ADD execution units performs an ADD on the separate packed data elements. The four MUL units remain idle during time T. At time T+1, assuming an “ADD A, B” instruction is issued, each of the four ADD execution units once again performs an ADD on the separate packed data elements, while the four MUL units once again remain idle. At time T+2, if a “MUL X, Y” instruction is issued, then each of the four MUL units separately performs a MUL on one of the four packed data elements, while the four ADD execution units remain idle. Finally, at time T+3, if an “ADD S, T” instruction is issued, then each of the four ADD execution units perform ADDs while the four MUL units remain idle.
The implementation described above can require a significant amount of duplicated hardware components and is inefficient in utilizing the hardware components (namely the ADD and MUL execution units). At any given time, one execution unit remains idle while the second execution unit is active.
SUMMARY
The present invention discloses a method and apparatus for staggering execution of an instruction. According to one embodiment of the invention, a single macro instruction is received wherein the single macro instruction specifies at least two logical registers and wherein the two logical registers respectively store a first and second packed data operands having corresponding data elements. An operation specified by the single macro instruction is then performed independently on a first and second plurality of the corresponding data elements from said first and second packed data operands at different times using the same circuit to independently generate a first and second plurality of resulting data elements. The first and second plurality of resulting data elements are stored in a single logical register as a third packed data operand.
Other features and advantages of the present invention will be apparent from the accompanying drawings and from the detailed description.
REFERENCES:
patent: 3675001 (1972-07-01), Singh
patent: 3723715 (1973-03-01), Chen et al.
patent: 3900724 (1975-08-01), Mclver et al.
patent: 3941990 (1976-03-01), Rabasse
patent: 4542476 (1985-09-01), Nagafuji
patent: 4890218 (1989-12-01), Bram
patent: 5210711 (1993-05-01), Rossmere et al.
patent: 5241493 (1993-08-01), Chu et al.
patent: 5311508 (1994-05-01), Buda et al.
patent: 5426598 (1995-06-01), Hagihara
patent: 5642306 (1997-06-01), Mennemeier et al.
patent: 5673427 (1997-09-01), Brown et al.
patent: 5721892 (1998-02-01), Peleg et al.
patent: 5793661 (1998-08-01), Dulong et al.
patent: 5835392 (1998-11-01), Dulong et al.
patent: 5852726 (1998-12-01), Lin et al.
patent: 5936872 (1999-08-01), Fischer et al.
patent: 6018351 (2000-01-01), Mennemeier et al.
patent: 6041403 (2000-03-01), Parker et al.
patent: 9907221.7 (1999-10-01), None
patent: WO 97/08608 (1997-03-01), None
patent: WO 97/22921 (1997-06-01), None
patent: WO 97/22923 (1997-06-01), None
patent: WO 97/22924 (1997-06-01), None
gnu@gnu.org (ChangLog) pp. A-D.
Abbott et al., “Broadband Algorithums with the Micro Unity Mediaprocessor”, Proceedings of COMPCON 1996, 349-354.
Hayes et al., “MicroUnity Software Development Environment”, Proceedings of COMPCON 1996, 341-348.
“64-Bit and Multimedia Extensions in the PA-RISC 2.0 Architecture”, Helett Packard, downloaded from Website rblee@cup.hp.com.huck@cup.hp.com, pp. 1-18.
“TM 1000 Preliminary Data Book”, Philips Semiconductors, 1997.
“21164 Alpha TM Microprocessor Data Sheet”, Samsung Eleectronics, 1997.
“Silicon Graphics Introduces Enhanced MIPS ® Architecture to lead the Interactive Digital Revolution, Silicon Graphics”, Oct. 21, 1996, downloaded form the Website webmaster @ www. sgi.com, pp. 1-2.
“Silicon Graphics Introduces Compact MIPS @ RISC Microprocessor Code for High Performance at a low Cost”, Oct. 21, 1996, downloaded from the Website webmaster @ www.sgi.com, pp. 1-2.
Killian, Earl, “MIPS Extension for Digital Media”, Silicon Graphics, pp. 1-10.
“MIPS V Instruction Set”, pp B1-B37.
“MIPS Digital Media Extension”, pp. C1-C40.
“MIPS Extension for Digital Media with 3D”, MIPS Technologies, Inc., Mar. 12, 1997, pp. 1-26.
“MIPS Digital Media Extension”, pp. C1-C40.
“MIPS Extension for Digital Media with 3D”, MIPS Technology, Inc., Mar. 12, 1997, pp. 1-26.
“64-Bit and Multimedia Extensions in the PA-RISC 2.0 Architecture”, Helett Packard, downloaded form Website rblee@cup.hp.com.huck@cup.hp.com. pp. 1-18.
“The VIS TM Instruction Set”, Sun Microsystems, Inc., 1997 pp. 1-2.
“ULTRASPARC TM The Visual Instructions Set (VIStm): On Chip Support for New-Media Processing”, Sun Microsystems, Inc., 1996, pp. 1-7.
ULTRASPARC TM and New Media Support Real-Time MPEG2 Decode with the Visu
Boswell Brent R.
Hinton Glenn J.
Menezes Karol F.
Roussel Patrice
Thakkar Shreekant S.
Blakely , Sokoloff, Taylor & Zafman LLP
Coleman Eric
LandOfFree
Method and apparatus for staggering execution of a single... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for staggering execution of a single..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for staggering execution of a single... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3340152