Dual-block discrete consine transform method

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C708S402000

Reexamination Certificate

active

06742010

ABSTRACT:

BACKGROUND
1. Field of the Invention
The present invention relates generally to systems and methods for performing discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) operations. The invention also relates to digital video compression and decompression, and more particularly to a video encoder and decoder for performing the discrete cosine transform and/or inverse discrete cosine transform with improved efficiency and reduced computational requirements.
2. Description of the Related Art
DSP theory provides a host of tools for the analysis and representation of signal data. The discrete cosine transform and its inverse are among the more ubiquitous of these tools in multimedia applications. The discrete cosine transform (DCT) of a discrete function f (j) , j=0, 1, . . . , N−1 is defined as
F

(
k
)
=
2

c

(
k
)
N


j
=
0
N
-
1



f

(
j
)
·
cos

[
(
2

j
+
1
)

k



π
2

N
]
,
where k =0, 1, . . . , N−1, and
c

(
k
)
=
{
1
2
for



k
=
0
1
for



k

0
}
.
The inverse discrete cosine transform (IDCT) is defined by
f

(
j
)
=

k
=
0
N
-
1

c

(
k
)

F

(
k
)

cos

[
(
2

j
+
1
)

k



π
2

N
]
,
where j=0, 1, . . . , N−1.
The discrete cosine transform may be used in a wide variety of applications and allows an arbitrary input array size. However, the straightforward DCT algorithm is often prohibitively time-consuming especially when executed on general purpose processors. In 1977, Chen et al. disclosed an efficient algorithm for performing the DCT in an article entitled “A Fast Computational Algorithm for the Discrete Cosine Transform”, published in IEEE Transactions on Communications, Vol. COM-25, No. 9, September 1977, authored by Wen-Hsiung Chen, C. Harrison Smith and S. C. Fralick, which is hereby incorporated by reference. Fast DCT algorithms such as that disclosed by Chen et al. are significantly more efficient that the straightforward DCT algorithm. Nevertheless, there remains room for improvement, particularly when the algorithm is employed in specific circumstances.
Traditional x86 processors are not well adapted for the types of calculations used in signal processing. Thus, signal processing software applications on traditional x86 processors have lagged behind what was realizable on other processor architectures. There have been various attempts to improve the signal processing performance of x86-based systems. For example, microcontrollers optimized for digital signal processing computations (DSPs) have been provided on plug-in cards or the motherboard. These microcontrollers operated essentially as hardwired coprocessors enabling the system to perform signal processing functions.
As multimedia applications become more sophisticated, the demands placed on computers are redoubled. Microprocessors are now routinely provided with enhanced support for these applications. For example, many processors now support single-instruction multiple-data (SIMD) commands such as MMX instructions. Advanced Micro Devices, Inc. (hereinafter referred to as AMD) has proposed and implemented 3DNow!™, a set of floating point SIMD instructions on x86 processors starting with the AMD-K6®-2. The AMD-K6®-2 is highly optimized to execute the 3DNow!™ instructions with minimum latency. Software applications written for execution on the AMD-K6®-2 may use these instructions to accomplish signal processing functions and the traditional x86 instructions to accomplish other desired functions.
The 3DNow! instructions, being SIMD commands, are “vectored” instructions in which a single operation is performed on multiple data operands. Such instructions are very efficient for graphics and audio applications where simple operations are repeated on each sample in a stream of data. SIMD commands invoke parallel execution in superscalar microprocessors where pipelining and/or multiple execution units are provided.
Vectored instructions typically have operands that are partitioned into separate sections, each of which is independently operated upon. For example, a vectored multiply instruction may operate upon a pair of 32-bit operands, each of which is partitioned into two 16-bit sections or four 8-bit sections. Upon execution of a vectored multiply instruction, corresponding sections of each operand are independently multiplied.
FIG. 1
illustrates the differences between a scalar (i.e., non-vectored) multiplication and a vector multiplication. To quickly execute vectored multiply instructions, microprocessors such as the AMD-K6®-2 use a number of multipliers in parallel.
FIG. 2
illustrates one embodiment of a representative computer system
100
such as the AMD-K6®-2 which is configured to support the execution of general-purpose instructions and parallel floating-point instructions. Computer system
100
may comprise a microprocessor
110
, memory
112
, bus bridge
114
, peripheral bus
116
, and a plurality of peripheral devices P
1
-PN. Bus bridge
114
couples to microprocessor
110
, memory
112
and peripheral bus
116
. Bus bridge
114
mediates the exchange of data between microprocessor
110
, memory
112
and peripheral devices P
1
-PN.
Microprocessor
110
is a superscalar microprocessor configured to execute instructions in a variable length instruction set. A subset of the variable length instruction set is the set of SIMD (simultaneous-instruction multiple-data) floating-point instructions. Microprocessor
110
is optimized to execute the SIMI floating-point instructions in a single clock cycle. In addition, the variable length instruction set includes a set of x86 instructions (e.g. the instructions defined by the 80486 processor architecture).
Memory
112
stores program instructions which control the operation of microprocessor
110
. Memory
112
additionally stores input data to be operated on by microprocessor
110
, and output data generated by microprocessor
110
, in response to the program instructions. Peripheral devices P
1
-PN are representative of devices such as network interface cards (e.g. Ethernet cards), modems, sound cards, video acquisition boards, data acquisition cards, external storage media, etc. Computer system
100
may be a personal computer, a laptop computer, a portable computer, a television, a radio receiver and/or transmitter, etc.
FIG. 3
illustrates one embodiment for microprocessor
110
. Microprocessor
110
may be configured with 3DNow!™ and MMX® technologies. Microprocessor
110
may comprise bus interface unit
224
, predecode unit
212
, instruction cache
214
, decode unit
220
, execution engine
230
, and data cache
226
. Microprocessor
110
may also include store queue
238
and an L
2
cache
240
. Additionally, microprocessor
110
may include a branch prediction unit and a branch resolution unit (not shown) to allow efficient speculative execution.
Predecode unit
212
may be coupled to instruction cache
214
, which stores instructions received from memory
112
via bus interface unit
224
and predecode unit
212
. Instruction cache
214
may also contain a predecode cache (not shown) for storing predecode information. Decode unit
220
may receive instructions and predecode information from instruction cache
214
and decode the instructions into component pieces. The component pieces may be forwarded to execution engine
230
. The component pieces may be RISC operands. (Microprocessor
110
may be RISC-based superscalar microprocessor). RISC ops are fixed-format internal instructions, jug most of which are executable by microprocessor
10
in a single clock cycle. RISC operations may be combined to form every function of the x86 instruction set.
Execution engine
230
may execute the decoded instructions in response to the component pieces received from decode unit
220
. As shown in
FIG. 4
, execution engine
230
may include a scheduler buffer
232
coupled to recei

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Dual-block discrete consine transform method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Dual-block discrete consine transform method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dual-block discrete consine transform method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3231145

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.