Fast DCT apparatus

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C708S401000

Reexamination Certificate

active

06195674

ABSTRACT:

FIELD OF THE INVENTION
Microfiche Appendix: There are 2 microfiche in total, and 105 frames in total.
The present invention relates to a discrete cosine transform (DCT) apparatus utilizing a data path, which contains no pipelining or storage means, and is able to operate at high speeds.
BACKGROUND OF THE INVENTION
Typically, a discrete cosine transform (DCT) apparatus as shown in
FIG. 77
performs a full two-dimensional (2-D) transformation of a block of 8×8 pixels by first performing a 1-D DCT on the rows of the 8×8 pixel block. It then performs another 1-D DCT on the columns of the 8×8 pixel block. Such an apparatus typically consists of an input circuit
1096
, an arithmetic circuit
1104
, a control circuit
1098
, a transpose memory circuit
1090
, and an output circuit
1092
.
The input circuit
1096
accepts 8-bit pixels from the 8×8 block. The input circuit
1096
is coupled by intermediate multiplexers
1100
,
1102
to the arithmetic circuit
1004
. The arithmetic circuit
1104
performs mathematical operations on either a complete row or column of the 8×8 block. The control circuit
1098
controls all the other circuits, and thus implements the DCT algorithm. The output of the arithmetic circuit is coupled to the transpose memory
1090
, register
1095
and output circuit
1092
. The transpose memory is in turn connected to multiplexer
1100
, which provides output to the next multiplexer
1102
. The multiplexer
1102
also receives input from the register
1094
. The transpose circuit
1090
accepts 8×8 block data in rows and produces that data in columns. The output circuit
1092
provides the coefficients of the DCT performed on a 8×8 block of pixel data.
In a typical DCT apparatus, it is the speed of the arithmetic circuit
1104
that basically determines the overall speed of the apparatus, since the arithmetic circuit
1104
is the most complex.
The arithmetic circuit
1104
of
FIG. 77
is typically implemented by breaking the arithmetic process down into several stages as described hereinafter with reference to
FIG. 78. A
single circuit is then built that implements each of these stages
1114
,
1148
,
1152
,
1156
using a pool of common resources, such as adders and multipliers. Such a circuit
1104
is mainly disadvantageous due to it being slower than optimal, because a single, common circuit is used to implement the various stages of circuit
1104
. This includes a storage means used to store intermediate results. Since the time allocated for the clock cycle of such a circuit must be greater or equal to the time of the slowest stage of the circuit, the overall time is potentially longer than the sum of all the stages.
FIG. 78
depicts a typical arithmetic data path, in accordance with the apparatus of
FIG. 77
, as part of a DCT with four stages. The drawing does not reflect the actual implementation, but instead reflects the functionality. Each of the four stages
1144
,
1148
,
1152
, and
1156
is implemented using a single, reconfigurable circuit. It is reconfigured on a cycle-by-cycle basis to implement each of the four arithmetic stages
1144
,
1148
,
1152
, and
1156
of the 1-D DCT. In this circuit, each of the four stages
1144
,
1148
,
1152
, and
1156
uses pool of common resources (e.g. adders and multipliers) and thus minimises hardware.
However, the disadvantage of this circuit is that it is slower than optimal. The four stages
1144
,
1148
,
1152
, and
1156
are each implemented from the same pool of adders and multipliers. The period of the clock is determined by the speed of the slowest stage, which in this example is 20 ns (for block
1144
). Adding in the delay (2 ns each) of the input and output multiplexers
1146
and
1154
and the delay (3 ns) of the flip-flop
1150
, the total time is 27 ns. Thus, the fastest this DCT implementation can run at is 27 ns.
Pipelined DCT implementations are also well known. The drawback with such implementations is that they require large amounts of hardware to implement. Whilst the present invention does not offer the same performance in terms of throughput, it offers an extremely good performance/size compromise, and good speed advantages over most of the current DCT implementations.
Therefore, a need clearly exists for an improved DCT/inverse-DCT method and apparatus that is able to overcome one or more disadvantages of conventional techniques. In particular, a need clearly exists for a method and apparatus that is able to reduce the time taken for the main arithmetic circuit in a DCT/inverse-DCT apparatus to calculate required results, thereby improving the overall performance of the DCT or inverse DCT.
SUMMARY OF THE INVENTION
In accordance with a first aspect of the invention, there is provided a discrete cosine transform (DCT) apparatus, comprising: a transpose memory means; and an arithmetic circuit interconnected with the transpose memory means, the arithmetic circuit consisting of a combinatorial circuit for calculating a DCT without clocked storage means.
Preferably, the combinatorial circuit comprises a predetermined number of stages for implementing the DCT, the stages arranged sequentially.
Preferably, the DCT apparatus further comprises means for multiplexing input data provided to the apparatus and data output by the transpose memory means. It may also comprise means for controlling operation of the DCT apparatus.
In accordance with a second aspect of the invention, there is provided an inverse discrete cosine transform (DCT) apparatus, comprising: a transpose memory means; and an arithmetic circuit interconnected with the transpose memory means, the arithmetic circuit consisting essentially of a combinatorial circuit for calculating an inverse DCT without clocked storage means.
In accordance with a third aspect of the invention, there is provided a method of discrete cosine transforming (DCT) data, the method comprising the steps of:
calculating a DCT of input data in accordance with a first orientation of the data using an arithmetic circuit consisting essentially of a combinatorial circuit for calculating the DCT without clocked storage means;
storing the transformed input data in accordance with the first orientation in a transpose memory means interconnected with the combinatorial circuit; and
calculating a DCT of the transformed input data stored in the transpose memory means in accordance with a second orientation of the data using the arithmetic circuit to provide transformed data.
Preferably, the DCT is calculated in a predetermined number of stages, the stages arranged sequentially.
The method may also comprise the step of multiplexing input data provided to the apparatus and data output by the transpose memory means.
In accordance with a fourth aspect of the invention, there is provided a method of inverse discrete-cosine transforming (DCT) data, the method comprising the steps of:
calculating an inverse DCT of input coefficients in accordance with a first orientation of the coefficients using an arithmetic circuit consisting essentially of a combinatorial circuit for calculating the inverse DCT without clocked storage means;
storing the inverse transformed input coefficients in accordance with the first orientation in a transpose memory means interconnected with the combinatorial circuit; and
calculating an inverse DCT of the transformed input coefficients stored in the transpose memory means in accordance with a second orientation using the arithmetic circuit to provide output inverse transformed data.
In the following detailed description, the reader's attention is directed, in particular, to
FIGS. 79
,
80
and
81
and their associated description without intending to detract from the disclosure of the remainder of the description.
TABLE OF CONTENTS
1.0 Brief Description of the Drawings
2.0 List of Tables
3.0 Description of the Preferred and Other Embodiments
3.1 General Arrangement of Plural Stream Architecture
3.2 Host/Co-processor Queuing
3.3 Register Description of Co-processor
3.4 Format of Plural Streams

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Fast DCT apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Fast DCT apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fast DCT apparatus will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2587955

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.