Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
1999-11-30
2001-08-28
Malzahn, David H. (Department: 2121)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
C708S201000, C708S620000
Reexamination Certificate
active
06282556
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of electronic hardware used for processing multimedia content such as digitally encoded signals. More specifically, the present invention relates to a data path architecture that can be used for a multimedia processor and is capable of performing high speed operations on operands of various data types.
2. Related Art
Multimedia processors (often called “coprocessors”) are more and more becoming indispensable components of every computer system or electronic device that processes multimedia content. Multimedia content can be audio/visual material that is digitally encoded using any number of different encoding standards, such as MPEG (Motion Picture Expert Group) or DV (Digital Video). Multimedia processors are used to digitally encode the digital multimedia content in order to reduce the amount of computer resources required to both store and transmit the digital content. Multimedia processors are also used to digitally decode the encoded multimedia content for rendering on a display screen and/or a speaker system so that the content can be interpreted by a user or viewer. In addition to being used in a computer system, a multimedia processor can also be used in an embedded system within an electronic device, such as within a digital video disk (DVD) player, a compact disk (CD) player or other consumer electronic device that can process audio/visual content.
Multimedia processors, in addition to being useful for processing multimedia content, can also be used to support other processes such as in real-time applications (e.g., flight simulators, speech recognition, video teleconferencing, computer games, streaming audio/video etc.). It is appreciated that the overall system performance of the multimedia processor is heavily dependent on the speed and architecture of the internal data path of the processor. Typically, the faster the data path can process instructions, and thereby process data, the more desirable the multimedia processor. For instance, processing digital images at 30 frames/second requires the processor to perform nearly 2.2 million multiply operations per second. Therefore, it would be advantageous to design a fast data path architecture that occupies smaller areas on the integrated circuit (IC) chip and that consumes less power.
To achieve real-time processing of media signals, architectural enhancements are necessary in order to alleviate the pressure for performance that is demanded of modem systems and technology. Enhancements to the existing instruction set first came as a result of performance demand that originated from specific computer applications such as graphics applications. Soon after, the enhancements appeared in general purpose processors such as the Intel MMX processor and this event reflected a change in the computational environment and; specifically, a shift towards media processing. These extensions operate on the multiple-data values under the control of a single instruction (SIMD). In most of these processors, data is packed into 64-bit registers in one of the general-purpose register files, reflecting their 64-bit adherence to the 64-bit architectural world. However, this 64-bit architecture is limited in data width and therefore not well suited for high performance graphics processing environments.
In multimedia applications, processor data paths use multiplier circuits to perform a wide range of functions such as Inverse Discrete Cosine (IDCT), Fast Fourier Transforms (FFT), and Multiply Accumulate (MAC) on 8-bit, 16-bit, and 32-bit signed and unsigned operands. However, multipliers that are able to process wide data formats typically consume extra processing cycles to perform the multiplication operation. Therefore, prior art data paths that include multiplier circuits typically have more pipestages in their execution phase to accommodate the wide data format multiply operations. Multiply instructions of these prior art processors require additional execution time to complete thereby consuming valuable processing time. The longer execution phase also acts to reduce the efficiency of other operations that only require one or two execution pipestages for completion. It would be advantageous to provide a more efficient data path that is also able to efficiently perform wide data format multiply operations.
One particular prior art multiplication circuit exists within the Intel MMX processor. This multiplication circuit performs 32-bit multiplication using a 16-bit multiplication circuit that is required to perform two iterations. If larger bit multiplication operations are required, then more iterations are performed. The tradeoff selected in this multiplier design requires that 8-bit multiplication not be supported otherwise too many iterations would be required to support larger bit operations. Since two iterations are required for 32-bit, this multiplication circuit is not able to accept new operands each clock cycle, but rather accepts new operands only every other cycle thereby drastically reducing its data throughput capacity. In another particular example, the Altivec processor of Motorola provides two separate multiplier circuits for large-bit multiply operations, e.g., one circuit for 8-bit and a second circuit for 16-bit. However, this approach is disadvantageous because it includes redundant hardware that increases area and power requirements of the processor. It would be advantageous to provide a circuit capable of large-bit multiply operations having high data throughput that does not have substantial hardware redundancy.
Moreover, in multimedia applications there are several specially adapted multimedia instructions that are useful for processing packed data types, such as those that represent encoded pixels or encoded audio data. Like the multiply operations, these specially adapted multimedia instructions often require the data path of a media processor to have extra pipestages to accommodate the instruction execution. It would be advantageous to provide a more efficient data path that is also able to efficiently process these specially adapted multimedia instructions.
SUMMARY OF THE INVENTION
Accordingly, the present invention provides a pipelined data path architecture for a multimedia processor that is very efficient, consumes less integrated circuit area and dissipates less power compared to conventional media coprocessors. The data path architecture of the present invention is also able to perform wide data format multiply operations within two execution pipestages. The data path architecture of the present invention is also able to perform specially adapted multimedia instructions within the two execution pipestages reserved for the execution phase of the overall pipeline. The data path architecture is also pipelined thereby allowing an instruction latency of two execution pipestages, but a data throughput of only one clock cycle.
What is disclosed is a pipelined data path architecture for use, in one embodiment, in a multimedia processor. The data path architecture requires a maximum of two execution pipestages to perform all instructions including wide data format multiply instructions and specially adapted multimedia instructions, such as the sum of absolute differences (SABD) instruction and other multiply with add (MADD) instructions. Most other instructions require only a single execution pipestage. The data path architecture includes two wide data format operand registers that supply four partitioned 32×32 multiplier circuits. In one embodiment, each operand register is 128-bits wide. Within two pipestages, the multiply circuit can perform one 128×128 multiply operation, four 32×32 multiply operations, eight 16×16 multiply operations or sixteen 8×8 multiply operations in parallel using a SIMD architecture.
The multiply circuit contains a compressor tree which generates a 256-bit sum vector and a 256-bit carry vector. These vectors are stored in pipelined registers and are supplied to
Chehrazi Farzad
Oklobdzija Vojin G.
Malzahn David H.
Sony Corporation of Japan
Wagner , Murabito & Hao LLP
LandOfFree
High performance pipelined data path for a media processor does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with High performance pipelined data path for a media processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and High performance pipelined data path for a media processor will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2517549