Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-01-28
2001-05-01
Korzuch, William R. (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S500000, C704S503000, C704S211000, C704S201000, C704S200100, C386S349000, C345S519000
Reexamination Certificate
active
06226608
ABSTRACT:
TECHNICAL FIELD
The present invention is related to audio signal processing in which audio information streams are encoded and assembled into frames of encoded information. In particular, the present invention is related to improving the quality of audio information streams conveyed by and recovered from the frames of encoded information.
BACKGROUND ART
In many video/audio systems, video/audio information is conveyed in information streams comprising frames of encoded audio information that are aligned with frames of video information, which means the sound content of the audio information that is encoded into a given audio frame is related to the picture content of a video frame that is either substantially coincident with the given audio frame or that leads or lags the given audio frame by some specified amount. Typically, the audio information is conveyed in an encoded form that has reduced information capacity requirements so that some desired number of channels of audio information, say between three and eight channels, can be conveyed in the available bandwidth.
These video/audio information streams are frequently subjected to a variety of editing and signal processing operations. A common editing operation cuts one or more streams of video/audio information into sections and joins or splices the ends of two sections to form a new information stream. Typically, the cuts are made at points that are aligned with the video information so that video synchronization is maintained in the new information stream. A simple editing paradigm is the process of cutting and splicing motion picture film. The two sections of material to be spliced may originate from different sources, e.g., different channels of information, or they may originate from the same source. In either case, the splice generally creates a discontinuity in the audio information that may or may not be perceptible.
A. Audio Coding
The growing use of digital audio has tended to make it more difficult to edit audio information without creating audible artifacts in the processed information. This difficulty has arisen in part because digital audio is frequently processed or encoded in segments or blocks of digital samples that must be processed as a complete entity. Many perceptual or psychoacoustic-based audio coding systems utilize filterbanks or transforms to convert segments of signal samples into blocks of encoded subband signal samples or transform coefficients that must be synthesis filtered or inverse transformed as complete blocks to recover a replica of the original signal segment. Editing operations are more difficult because an edit of the processed audio signal must be done between blocks; otherwise, audio information represented by a partial block on either side of a cut cannot be properly recovered.
An additional limitation is imposed on editing by coding systems that process overlapping segments of program material. Because of the overlapping nature of the information represented by the encoded blocks, an original signal segment cannot properly be recovered from even a complete block of encoded samples or coefficients.
This limitation is clearly illustrated by a commonly used overlapped-block transform, a modified discrete cosine transform (ACT), that is described in Princen, Johnson, and Bradley, “Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation,” ICASSP 1987 Conf. Proc., May 1987, pp. 2161-64. This particular time-domain aliasing cancellation (TDAC) transform is the time-domain equivalent of an oddly-stacked critically sampled single-sideband analysis-synthesis system and is referred to herein as Oddly-Stacked Time-Domain Aliasing Cancellation (O-TDAC).
The forward or analysis transform is applied to segments of samples that are weighted by an analysis window function and that overlap one another by one-half the segment length. The analysis transform achieves critical sampling by decimating the resulting transform coefficients by two; however, the information lost by this decimation creates time-domain aliasing in the recovered signal. The synthesis process can cancel this aliasing by applying an inverse or synthesis transform to blocks of transform coefficients to generate segments of synthesized samples, applying a suitably shaped synthesis window function to the segments of synthesized samples, and overlapping and adding the windowed segments. For example, if a TDAC analysis transform system generates a sequence of blocks B
1
-B
2
from which segments S
1
-S
2
are to be recovered, then the aliasing artifacts in the last half of segment S
1
and in the first half of segment S
2
will cancel each another.
If two encoded information streams from a TDAC coding system are spliced at a point between blocks, however, the segments on either side of a splice will not cancel each other's aliasing artifacts. For example, suppose one encoded information stream is cut so that it ends at a point between blocks B
1
-B
2
and another encoded information stream is cut so that it begins at a point between blocks B
3
-B
4
. If these two encoded information streams are spliced so that block B
1
immediately precedes block B
4
, then the aliasing artifacts in the last half of segment S
1
recovered from block B
1
and in the first half of segment S
4
recovered from block B
4
will generally not cancel each another.
B. Audio and Video Synchronization
Even greater limitations are imposed upon editing applications that process both audio and video information for at least two reasons. One reason is that the video frame length is generally not equal to the audio block length. The second reason pertains only to certain video standards like NTSC that have a video frame rate that is not an integer multiple of the audio sample rate. Examples in the following discussion assume an audio sample rate of 48 k samples per second. Most professional equipment uses this rate. Similar considerations apply to other sample rates such as 44.1 k samples per second, which is typically used in consumer equipment.
The frame and block lengths for several video and audio coding standards are shown in Table I and Table II, respectively. Entries in the tables for “MPEG II” and “MPEG III” refer to MPEG-2 Layer II and MPEG-2 Layer III coding techniques specified in standard ISO/IEC 13818-3 by the Motion Picture Experts Group of the International Standards Organization. The entry for “AC-3” refers to a coding technique developed by Dolby Laboratories, Inc. and specified in standard A-52 by the Advanced Television Systems Committee. The “block length” for 48 kHz PCM is the time interval between adjacent samples.
TABLE I
Video Frames
Video Standard
Frame Length
DTV (30 Hz)
33.333
msec.
NTSC
33.367
msec.
PAL
40
msec.
Film
41.667
msec.
TABLE II
Audio Frames
Audio Standard
Block Length
PCM
20.8
&mgr;sec.
MPEG II
24
msec.
MPEG III
24
msec.
AC-3
32
msec.
In applications that bundle together video and audio information conforming to any of these standards, audio blocks and video frames are rarely synchronized. The minimum time interval between occurrences of video/audio synchronization is shown in Table III. For example, the table shows that motion picture film, at 24 frames per second, will be synchronized with an MPEG audio block boundary no more than once in each 3 second period and will be synchronized with an AC-3 audio block no more than once in each 4 second period.
TABLE III
Minimum Time Interval Between Video/Audio Synchronization
Audio
Standard
DTV (30 Hz)
NTSC
PAL
Film
PCM
33.333
msec.
166.833
msec.
40 msec.
41.667
msec.
MPEG II
600
msec.
24.024
sec.
120 msec.
3
sec.
MPEG III
600
msec.
24.024
sec.
120 msec.
3
sec.
AC-3
800
msec.
32.032
sec.
160 msec.
4
sec.
The minimum interval between occurrences of synchronization, expressed in numbers of audio blocks to video frames, is shown in Table IV. For example, synchronization occurs no more than once between AC-3 blocks and PAL frames within an interval spanned by 5 audio blocks and 4 video frames.
TABLE IV
Numbers of Frames Between Video/Audi
Fielder Louis Dunn
Truman Michael Mead
Chawan Vijay B
Dolby Laboratories Licensing Corporation
Gallagher & Lathrop
Korzuch William R.
Lathrop David N.
LandOfFree
Data framing for adaptive-block-length coding system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Data framing for adaptive-block-length coding system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data framing for adaptive-block-length coding system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2503324