Methods and apparatus for efficient cosine transform...

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06754687

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to improvements in parallel processing, and more particularly to methods and apparatus for efficient cosine transform implementations on the manifold array (“ManArray”) processing architecture.
BACKGROUND OF THE INVENTION
Many video processing applications, such as the moving picture experts group (MPEG) decoding and encoding standards, use a discrete cosine transform (DCT) and its inverse, the indirect discrete cosine transform (IDCT), in their compression algorithms. The compression standards are typically complex and specify that a high data rate must be handled. For example, MPEG at Main Profile and Main Level specifies 720×576 picture elements (pels) per frame at 30 frames per second and up to 15 Mbits per second. The MPEG Main Profile at High Level specifies 1920×1152 pels per frame at 60 frames per second and up to 80 Mbits per second. Video processing is a time constrained application with multiple complex compute intensive algorithms such as the two dimensional (2D) 8×8 IDCT. The consequence is that processors with high clock rates, fixed function application specific integrated circuits (ASICs), or combinations of fast processors and ASICs are typically used to meet the high processing load. Having efficient 2D 8×8 DCT and IDCT implementations is of great advantage to providing a low cost solution.
Prior art approaches, such as Pechanek et al. U.S. Pat. No. 5,546,336, used a specialized folded memory array with embedded arithmetic elements to achieve high performance with 16 processing elements. The folded memory array and large number of processing elements do not map well to a low cost regular silicon implementation. It will be shown in the present invention that high performance cosine transforms can be achieved with one quarter of the processing elements as compared to the 16 PE Mfast design in a regular array structure without need of a folded memory array. In addition, the unique instructions, indirect VLIW capability, and use of the ManArray network communication instructions allow a general programmable solution of very high performance.
SUMMARY OF THE INVENTION
To this end, the ManArray processor as adapted as further described herein provides efficient software implementations of the IDCT using the ManArray indirect very long instruction word (iVLIW) architecture and a unique data-placement that supports software pipelining between processor elements (PEs) in the 2×2 ManArray processor. For example, a two-dimensional (2D) 8×8 IDCT, used in many video compression algorithms such as MPEG, can be processed in 34-cycles on a 2×2 ManArray processor and meets IEEE Standard 1180-1990 for precision of the IDCT. The 2D 8×8 DCT algorithm, using the same distributed principles covered in the distributed 2D 8×8 IDCT, can be processed in 35-cycles on the same 2×2 ManArray processor. With this level of performance, the clock rate can be much lower than is typically used in MPEG processing chips thereby lowering overall power usage.
An alternative software process for implementing the cosine transforms on the ManArray processor provides a scalable algorithm that can be executed on various arrays, such as a 1×1, a 1×2, a 2×2, a 2×3, a 2×4, and so on allowing scalable performance. Among its other aspects, this new software process makes use of the scalable characteristics of the ManArray architecture, unique ManArray instructions, and a data placement optimized for the MPEG application. In addition, due to the symmetry of the algorithm, the number of VLIWs is minimized through reuse of VLIWs in the processing of both dimensions of the 2D computation.
The present invention defines a collection of eight hardware ManArray instructions that use the ManArray iVLIW architecture and communications network to efficiently calculate the distributed two-dimensional 8×8 IDCT. In one aspect of the present invention, appropriate data distribution and software pipeline techniques are provided to achieve a 34-cycle distributed two-dimensional 8×8 IDCT on a 2×2 ManArray processor that meets IEEE Standard 1180-1990 for precision of the IDCT. In another aspect of the present invention, appropriate data distribution patterns are used in local processor element memory in conjunction with a scalable algorithm to effectively and efficiently reuse VLIW instructions in the processing of both dimensions of the two dimensional algorithm.


REFERENCES:
patent: 4829465 (1989-05-01), Knauer
patent: 5285402 (1994-02-01), Keith
patent: 5546336 (1996-08-01), Pechanek et al.
patent: 5854757 (1998-12-01), Dierke
patent: 5870497 (1999-02-01), Galbi et al.
patent: 5978508 (1999-11-01), Tsuboi
patent: 0-720103 (1996-07-01), None
Pechanek, G.G. et al., “M.f.a.s.t.: a Single Chip Highly Parallel Image Processing Architecture”, Proceedings International Conference on Image Processing, Oct. 1995, vol. 1, pp. 69-72.
Wang, C-L et al. “Highly Parallel VLSI Architectures for the 2-D DCT and IDCT Computations”, IEEE Region 10's Ninth Annual International Conference, Aug. 1994, Col. 1, pp. 295-299.
G.G. Pechanek, C.W. Kurak, C.J. Glossner, C.H.L. Moller, and S.J. Walsh, “M.f.a.s.t.: a Highly Parallel Single Chip DSP with a 2D IDCT Example”, The Sixth International Conference on Signal Processing Applications & Technology, Boston, MA, Oct. 24-26, 1995.
CAS Standards Committee, “IEEE Standard Specifications for the Implementations of 8×8 Inverse Discrete Cosine Transform”, IEEE Std., Mar. 18, 1991, pp. 1-13.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and apparatus for efficient cosine transform... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods and apparatus for efficient cosine transform..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for efficient cosine transform... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3354331

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.