Multiple mode variable rate speech coding

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S201000, C704S220000, C704S214000, C704S223000

Reexamination Certificate

active

06691084

ABSTRACT:

BACKGROUND OF THE INVENTION
I. Field of the Invention
The present invention relates to the coding of speech signals. Specifically, the present invention relates to classifying speech signals and employing one of a plurality of coding modes based on the classification.
II. Description of the Related Art
Many communication systems today transmit voice as a digital signal, particularly long distance and digital radio telephone applications. The performance of these systems depends, in part, on accurately representing the voice signal with a minimum number of bits. Transmitting speech simply by sampling and digitizing requires a data rate on the order of 64 kilobits per second (kbps) to achieve the speech quality of a conventional analog telephone. However, coding techniques are available that significantly reduce the data rate required for satisfactory speech reproduction.
The term “vocoder” typically refers to devices that compress voiced speech by extracting parameters based on a model of human speech generation. Vocoders include an encoder and a decoder. The encoder analyzes the incoming speech and extracts the relevant parameters. The decoder synthesizes the speech using the parameters that it receives from the encoder via a transmission channel. The speech signal is often divided into frames of data and block processed by the vocoder.
Vocoders built around linear-prediction-based time domain coding schemes far exceed in number all other types of coders. These techniques extract correlated elements from the speech signal and encode only the uncorrelated elements. The basic linear predictive filter predicts the current sample as a linear combination of past samples. An example of a coding algorithm of this particular class is described in the paper “A 4.8 kbps Code Excited Linear Predictive Coder,” by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988.
These coding schemes compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies (i e., correlated elements) inherent in speech. Speech typically exhibits short term redundancies resulting from the mechanical action of the lips and tongue, and long term redundancies resulting from the vibration of the vocal cords. Linear predictive schemes model these operations as filters, remove the redundancies, and then model the resulting residual signal as white gaussian noise. Linear predictive coders therefore achieve a reduced bit rate by transmitting filter coefficients and quantized noise rather than a full bandwidth speech signal.
However, even these reduced bit rates often exceed the available bandwidth where the speech signal must either propagate a long distance (e.g. ground to satellite) or coexist with many other signals in a crowded channel. A need therefore exists for an improved coding scheme which achieves a lower bit rate than linear predictive schemes.
SUMMARY OF THE INVENTION
The present invention is a novel and improved method and apparatus for the variable rate coding of a speech signal. The present invention classifies the input speech signal and selects an appropriate coding mode based on this classification. For each classification, the present invention selects the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction. The present invention achieves low average bit rates by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. The present invention switches to lower bit rate modes during portions of speech where these modes produce acceptable output.
An advantage of the present invention is that speech is coded at a low bit rate. Low bit rates translate into higher capacity, greater range, and lower power requirements.
A feature of the present invention is that the input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. The present invention therefore can apply various coding modes to different types of active speech, depending upon the required level of fidelity.
Another feature of the present invention is that coding modes may be utilized according to the strengths and weaknesses of each particular mode. The present invention dynamically switches between these modes as properties of the speech signal vary with time.
A further feature of the present invention is that, where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. The present invention uses this coding in a dynamic fashion whenever unvoiced speech or background noise is detected.
The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit of a reference number identifies the drawing in which the reference number first appears.


REFERENCES:
patent: 3633107 (1972-01-01), Brady
patent: 4012595 (1977-03-01), Ota
patent: 4076958 (1978-02-01), Fulghum
patent: 4214125 (1980-07-01), Mozer et al.
patent: 4360708 (1982-11-01), Taguchi et al.
patent: 4535472 (1985-08-01), Tomcik
patent: 4610022 (1986-09-01), Kitayama et al.
patent: 4672669 (1987-06-01), DesBlache et al.
patent: 4672670 (1987-06-01), Wang et al.
patent: 4677671 (1987-06-01), Galand et al.
patent: RE32580 (1988-01-01), Atal et al.
patent: 4764963 (1988-08-01), Atal
patent: 4771465 (1988-09-01), Bronson et al.
patent: 4797925 (1989-01-01), Lin
patent: 4797929 (1989-01-01), Gerson et al.
patent: 4827517 (1989-05-01), Atal et al.
patent: 4843612 (1989-06-01), Brusch et al.
patent: 4852179 (1989-07-01), Fette
patent: 4856068 (1989-08-01), Quatieri, Jr. et al.
patent: 4864561 (1989-09-01), Ashenfelter et al.
patent: 4885790 (1989-12-01), McAulay et al.
patent: 4890327 (1989-12-01), Bertrand et al.
patent: 4896361 (1990-01-01), Gerson
patent: 4899384 (1990-02-01), Crouse et al.
patent: 4899385 (1990-02-01), Ketchum et al.
patent: 4918734 (1990-04-01), Muramatsu et al.
patent: 4933957 (1990-06-01), Bottau et al.
patent: 4937873 (1990-06-01), McAulay et al.
patent: 4965789 (1990-10-01), Bottau et al.
patent: 5023910 (1991-06-01), Thomson
patent: 5054072 (1991-10-01), McAulay et al.
patent: 5140638 (1992-08-01), Moulsley et al.
patent: 5222189 (1993-06-01), Fielder
patent: 5414796 (1995-05-01), Jacobs et al.
patent: 5459814 (1995-10-01), Gupta et al.
patent: 5495555 (1996-02-01), Swaminathan
patent: 5548680 (1996-08-01), Cellario
patent: 5596676 (1997-01-01), Swaminathan et al.
patent: 5657418 (1997-08-01), Gerson et al.
patent: 5734789 (1998-03-01), Swaminathan et al.
patent: 5812965 (1998-09-01), Massaloux
patent: 5884252 (1999-03-01), Ozawa
patent: 5884253 (1999-03-01), Kleijn
patent: 5890108 (1999-03-01), Yeldener
patent: 5909663 (1999-06-01), Iijima et al.
patent: 5911128 (1999-06-01), DeJaco
patent: 5933802 (1999-08-01), Emori
patent: 5956673 (1999-09-01), Weaver, Jr. et al.
patent: 5995923 (1999-11-01), Mermelstein et al.
patent: 6205423 (2001-03-01), Su et al.
patent: 6240386 (2001-05-01), Thyssen et al.
patent: 0718822 (1994-12-01), None
DeMartin (“Mixed-Domain Coding of Speech at 3 KB/S,” Conference on Acoustics, Speech & Signal Processing, May 1996).*
“M-LCELP speech coding at 4 kbps”, Ozawa, K.; Serizawa, M.; Miyano, T.; Nomura, T., International Conference on Acoustics, Speech, and Signal Processing, Apr. 1994.*
Lupini, et al. “A Multi-Mode Variable Rate CELP Coder Based on Frame Classification” Proceedings of the Int'l Conf. On Communications 1:406-409 (1993).
Paksoy, et al. “Variable Rate Speech Coding for Multiple Access Wireless Networks” Prceedings of the Mediterranean Electrotechnical Conf. 1:47-50 (1994).
Atal et al., “Adaptive Predictive Coding of Speech Signals”,The Bell System Technical Journal, Oct. 1970, pp. 1973-1986.
Schroeder et al. “Stoc

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multiple mode variable rate speech coding does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multiple mode variable rate speech coding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiple mode variable rate speech coding will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3291508

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.