Electrical computers and digital processing systems: processing – Processing architecture – Vector processor
Reexamination Certificate
1998-06-16
2001-03-13
Kim, Kenneth S. (Department: 2783)
Electrical computers and digital processing systems: processing
Processing architecture
Vector processor
C712S005000, C712S008000
Reexamination Certificate
active
06202141
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to computer processing systems, and, in particular, vector multiplication operations performed by computer processing systems.
2. Related Art
Partial multiplication of an integer x-bits wide with another integer y-bits wide generates a result less than (x+y) bits wide, which typically represents the high order half (or low order half) of the full multiplication operation of the two integers. A vector multiplication operation may utilize partial multiplication by defining multiplication primitives that perform a partial multiplication operation on elements of partitioned source vectors to a produce a resultant vector. Prior art implementations have used such partial multiplication operations. For example, the VISTM instruction set extension to the SPARC-V9™ architecture developed by SUN Microsystems, Inc. includes a series of vis_fmul8×16 instructions that perform a partial multiplication operation on elements of partitioned source vectors to a produce a resultant vector. The elements of partitioned source vectors that are multiplied together vary based upon the particular vis_fmul8×16 instruction. A more detailed description of the series of vis_fmul8×16 vector multiplication operations that make up the VIS™ instruction set is set forth in “VIS™ Instruction Set User's Manual”, Sun Microsystems, Inc., 1997, pp. 54-64.
Full multiplication of an integer x-bits wide with another integer y-bits wide generates a result (x+y) bits wide. A vector multiplication operation may utilize full multiplication by defining multiplication primitives that perform a full multiplication operation on elements of partitioned source vectors to a produce a resultant vector. Prior art implementations that use such full multiplication operations may be placed into one of two categories: Accumulator-based, and Register-pair Destination.
In an Accumulator-based implementation, the result of the full multiplication operation on the elements of the partitioned source vectors is written to a non-general purpose register, named accumulator, of wider width than the general purpose registers. For example, the Digital Media Extension (MDMX) extension to the MIPS architecture developed by Silicon Graphics Inc. includes an MULA instruction that multiplies together elements of two source vectors and writes the result to a private 192-bit Accumulator register (which cannot be directly loaded from or stored to main memory, but must be staged though a FP register file). A more detailed description of the MULA vector multiplication operation in the MDMX extension set is set forth in “MIPS Digital Media Extension”, Silicon Graphics, Inc., pp. C-18.
In a Register-pair Destination implementation, the result of the full multiplication operation on the elements of the partitioned source vectors is written to a pair of general purpose registers. For example, the instruction set architecture of the broadband processor developed by MicroUnity Systems Engineering, Inc. includes a g.mult.32 instruction that multiplies together the corresponding symbols in two 64-bit registers and writes the result to two 64-bit registers. A more detailed description of the g.mult.32 instruction is set forth in “Architecture of a Boradband MediaProcessor”, MIPS Digital Media Extension”, MicroUnity Systems Engineering, Inc., 1996, which was presented at COMPSCON96, Feb. 25-29, 1996.
There are significant limitations that pertain to each of the prior art implementations discussed above in performing a full multiplication operation on elements of two or more source vectors. First, implementations that perform vector multiplication operation utilizing partial multiplication require significant computational overhead to piece together the partial multiplication results to generate the results of the full multiplication operation. Second, the Accumulator-based implementations are inflexible due to the fact that there is a very limited number (typically one or two) non-general purpose accumulator registers that may be used to store the results of the vector multiplication operation, which restricts the number of vector multiplication operations that can be concurrently performed by the processor. Finally, Register-pair Destination implementations complicate the run-time dispatch operation (i.e., register renaming operation) of instructions due to the fact that such implementation write the results of the vector multiplication operations to two or more registers.
Thus, there is a need in the art to provide an efficient and flexible mechanism for performing a full multiplication operation on the elements of two or more source vectors.
SUMMARY OF THE INVENTION
The above-stated problems and related problems of the prior art are solved with the principles of the present invention, method and apparatus for performing vector multiplication by splitting multiplication operation among odd and even data elements. The present invention partitions a vector multiplication operation into an even and odd path. In the odd path, the odd data elements of the source vectors are selected, and a full multiplication operation is performed on the selected odd data elements. In the even path, the even data elements of the source vectors are selected, and a full multiplication operation is performed on the selected even data elements. The results of the two paths are merged together with a merge operation. The vector multiplication mechanism of the present invention preferably uses a single general purpose register to store each result of the odd and even vector multiplication operations. In addition, the computational overhead of the merge operation may be amortized over a series of vector operations.
REFERENCES:
patent: 5251323 (1993-10-01), Isobe
patent: 5678058 (1997-10-01), Sato
patent: 5933650 (1999-08-01), van Hook et al.
“VIS Instruction Set User's Manual”, Sun Microsystems, Inc., Jul. 1997.
“MIPS Digital Media Extension”, Silicon Graphics, Inc.
Craig Hansen, “Architecture of a Broadband Mediaprocessor”, COMPCON96, Feb. 25-29, 1996.
“Appendix A, IA MMX Instruction Set Summary” and “Intel Architecture MMX Instruction Set” Intel Corporation, May 20, 1998.
“64-bit and Multimedia Extensions in the PA-RISC 2.0 Architecture”, Hewlett Packard, May 20, 1998.
Millind Mittal et al., “MMX Technology Architecture Overview”, Intel Technology Journal Q3 '97.
Diefendorff Keith Everett
Dubey Pradeep Kumar
Hochsprung Ronald Ray
Olsson Brett
Scales, III Hunter Ledbetter
F. Chau & Associates LLP
International Business Machines - Corporation
Kim Kenneth S.
LandOfFree
Method and apparatus for performing vector operation using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for performing vector operation using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for performing vector operation using... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2483707