Low bit-rate speech coder using adaptive open-loop subframe...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S207000, C704S219000

Reexamination Certificate

active

06345248

ABSTRACT:

BACKGROUND
1. Technical Field
The present invention relates generally to speech coding; and more particularly, it relates to low bit-rate speech coding using adaptive open-loop subframe pitch lag estimation and vector quantization.
2. Related Art
Speech signals can usually be classified as falling within either a voiced region or an unvoiced region. In most languages, the voiced regions are normally more important than unvoiced regions because human beings can make more sound variations in voiced speech than in unvoiced speech. Therefore, voiced speech carries more information than unvoiced speech.
To be able to compress, transmit, and decompress voiced speech with high quality is thus the forefront of modern speech coding technology.
It is understood that neighboring speech samples are highly correlated, especially for voiced speech signals. This correlation represents the spectrum envelope of the speech signal. In one speech coding approach called linear predictive coding (LPC), the value of the digitized speech sample at any particular time index is modeled as a linear combination of previous digitized speech sample values. This relationship is called prediction since a subsequent signal sample is thus linearly predictable according to earlier signal values. The coefficients used for the prediction are simply called the LPC prediction coefficients. The difference between the real speech sample and the predicted speech sample is called the LPC prediction error, or the LPC residual signal. The LPC prediction is also called short-term prediction since the prediction process takes place only with few adjacent speech samples, typically around 10 speech samples.
The pitch also provides important information in the voiced speech signals. One might already have experienced that by varying the pitch using a tape recorder, a male voice may be modified or sped up, to sound like a female voice, and vice versa, since the pitch describes the fundamental frequency of the human voice. Pitch also carries voice intonations that are useful for manifesting happiness, anger, questions, doubt, etc. Therefore, precise pitch information is essential to guarantee good speech reproduction.
For speech coding purposes, the pitch is described by the pitch lag and the pitch prediction coefficient (or pitch gain). A further discussion of pitch lag estimation is described in copending application entitled “Pitch Lag Estimation System Using Frequency-Domain Lowpass Filtering of the Linear Predictive Coding (LPC) Residual,” Ser. No. 08/454,477, filed May 30, 1995, invented by Huan-Yu Su, and now allowed, the disclosure of which is incorporated herein by reference. Advanced speech coding systems require efficient and precise extraction (or estimation) of the LPC prediction coefficients, the pitch information (i.e. the pitch lag and the pitch prediction coefficient), and the excitation signal from the original speech signal, according to a speech reproduction model. The information is then transmitted through the limited available bandwidth of the media, such as a transmission channel (e.g., wireless communication channel) or storage channel (e.g., digital answering machine). The speech signal is then reconstructed at the receiving side using the same speech reproduction model used at the encoder side.
Code-excited linear-prediction (CELP) coding is one of the most widely used LPC based speech coding approaches. A speech regeneration model is illustrated in FIG.
1
. The gain scaled (via
116
) innovation vector (
115
) output from a prestored innovation codebook (
114
) is added to the output of the pitch prediction (
112
) to form the excitation signal (
120
), which is then filtered through the LPC synthesis filter (
110
) to obtain the output speech.
To guarantee good quality of the reconstructed output speech, it is essential for the CELP decoder to have an appropriate combination of LPC filter parameters, pitch prediction parameters, innovation index, and gain. Thus, determining the best parameter combination that minimizes the perceptual difference between the input speech and the output speech is the objective of the CELP encoder (or any speech coding approach). In practice, however, due to complexity limitations and delay constraints, it has been found to be extremely difficult to exhaustively search for the best combination of parameters.
Most proposed speech codecs (coders/decoders) operating at a medium to low bit-rate (4-16 kbits/sec) group digitized speech samples in blocks (10-40 msec), each block being called a speech coding frame. As described in
FIG. 2
, after preprocessing (
210
), LPC analysis and quantization (
212
) are performed once per coding frame, while pitch analysis (
214
) and innovation signal (code vector) analysis (
224
) are performed once per subframe (
216
) (2-8 msec). Typically, each frame includes two to four subframes. This frame and subframe approach is based upon the observation that the LPC information is more slowly changing in speech as compared to the pitch information or the innovation information. Therefore, the minimization of the global perceptually weighted coding error is replaced by a series of lower dimensional minimizations over disjoint temporal intervals. This procedure results in a significantly lower complexity requirement to realize a CELP speech coding system. However, the drawback to this frame and subframe approach is that the pitch lag information is generally determine and scalar quantized in each successive subframe such that the bit-rate required to transmit the pitch lag information is too high for low bit-rate applications. For example, a typical rate of 1.3 kbits/sec is usually necessary to provide adequate pitch lag information to maintain good speech reproduction. Although such a requirement in bandwidth is not difficult to satisfy in speech coding systems operating at a bit-rate of 8 kbits/sec or higher, using 1.3 kbits/sec to transmit pitch lag information alone is excessive for low bit-rate coding applications operating, for example, at 4 kb/s.
In the low bit-rate speech coding field, advanced high quality parameter quantization schemes are widely used and have become essential. Vector quantization (VQ) is one of the most important contributors to achieve low bit-rate speech coding. In comparison to the simple scalar quantization (SQ) scheme, VQ results in much better quality at the same bit-rate, or same quality at much lower bit-rate. Unfortunately, VQ is not applicable to the pitch lag information quantization according to the current CELP speech coding model. To better explain this idea, the parameter generation procedure for the pitch lag in a CELP coder will be examined below.
Referring back to
FIG. 2
, it can be seen during the pitch analysis at (
214
) that the conventional pitch prediction procedure in a CELP coder is a feed back process, which takes past excitation signals from past subframes as an input to the pitch prediction module, and produces a pitch contribution vectors E
LAG
. Since pitch prediction models the low periodicity of the speech signal, it is also called long-term prediction because the prediction terms are longer than those of LPC. For a given subframe, the pitch lag (“Lag”) is searched around a range, typically between 18 and 150 speech samples to cover the majority of speech variations of the human being. The search is performed according to a searching step distribution. This distribution is predetermined by a compromise between high temporal resolution and low bit-rate requirements.
For example, in the North American Digital Cellular Standard IS-54, the pitch lag searching range is predetermined to be from 20 to 146 samples and the step size is one sample, e.g., possible pitch lag choices around 30 are 28, 29, 30, 31, and 32. Once the optimal pitch lag is found, there is an index associated with its value, for example, 29. In another speech coding standard, the International Telecommunication Union (ITU) G.729 speech coding standard, the pitch lag searching range is set to be [19⅓,143&rs

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Low bit-rate speech coder using adaptive open-loop subframe... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Low bit-rate speech coder using adaptive open-loop subframe..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Low bit-rate speech coder using adaptive open-loop subframe... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2948249

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.