Speech coding with improved background noise reproduction

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Speech coding with improved background noise reproduction Speech coding with improved background noise reproduction

: 1998-09-16
: 2001-08-14
: Tsang, Fan (Department: 2645)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Recognition

: C704S231000, C704S225000
: Reexamination Certificate
: active
: 06275798
: ABSTRACT:

FIELD OF THE INVENTION
The invention relates generally to speech coding and, more particularly, to the reproduction of background noise in speech coding.
BACKGROUND OF THE INVENTION
In linear predictive type speech coders such as Code Excited Linear Prediction (CELP) speech coders, the incoming original speech signal is typically divided into blocks called frames. A typical frame length is 20 milliseconds or 160 samples, which frame length is commonly used in, for example, conventional telephony bandwidth cellular applications. The frames are typically divided further into subframes, which subframes often have a length of 5 milliseconds or 40 samples.
In conventional speech coders such as mentioned above, parameters describing the vocal tract, pitch, and other features are extracted from the original speech signal during the speech encoding process. Parameters that vary slowly are computed on a frame-by-frame basis. Examples of such slowly varying parameters include the so called short term predictor (STP) parameters that describe the vocal tract. The STP parameters define the filter coefficients of the synthesis filter in linear predictive speech coders. Parameters that vary more rapidly, for example, the pitch, and the innovation shape and innovation gain parameters are typically computed for every subframe.
After the parameters have been computed, they are then quantized. The STP parameters are often transformed to a representation more suitable for quantization such as a line spectrum frequency (LSF) representation. The transformation of STP parameters into LSF representation is well known in the art.
Once the parameters have been quantized, error control coding and checksum information is added prior to interleaving and modulation of the parameter information. The parameter information is then transmitted across a communication channel to a receiver wherein a speech decoder performs basically the opposite of the above-described speech encoding procedure in order to synthesize a speech signal which resembles closely the original speech signal. In the speech decoder, postfiltering is commonly applied to the synthesized speech signal to enhance the perceived quality of the signal.
Speech coders which use linear predictive models such as the CELP model are typically very carefully adapted to the coding of speech, so the synthesis or reproduction of non-speech signals such as background noise is often poor in such coders. Under poor channel conditions, for example when the quantized parameter information is distorted by channel errors, the reproduction of background noise deteriorates even more. Even under clean channel conditions, background noise is often perceived by the listener at the receiver as a fluctuating and unsteady noise. In CELP coders, the reason for this problem is mainly the mean squared error (MSE) criterion conventionally used in the analysis-by-synthesis loop in combination with bad correlation between the target and synthesized signals. Under poor channel conditions, the problem is, as mentioned, even worse, because the level of the background noise fluctuates greatly. This is perceived by the listener as very annoying because the background noise level is expected to vary quite slowly.
One solution for improving the perceived quality of background noise in both clean and noisy channel conditions could include the use of voice activity detectors (VADs) which make a hard (e.g., yes or no) decision regarding whether the signal that is being coded is speech or non-speech. Based on the hard decision, different processing techniques can be applied in the decoder. For example, if the decision is non-speech, then the decoder can assume that the signal is background noise, and can operate to smooth out the spectral variations in the background noise. However, this hard decision technique disadvantageously permits the listener to hear the decoder switch between speech processing actions and non-speech processing actions.
In addition to the aforementioned problems, the reproduction of background noise is degraded even more at lowered bit rates (for example, below 8 kb/s). Under bad channel conditions at lowered bit rates, the background noise is often heard as a fluttering effect caused by unnatural variations in the level of the decoded background noise.
It is therefore desirable to provide for reproduction of background noise in a linear predictive speech decoder such as a CELP decoder, while avoiding the aforementioned undesirable listener perceptions of the background noise.
The present invention provides improved reproduction of background noise. The decoder is capable of gradually (or softly) increasing or decreasing the application of energy contour smoothing to the signal that is being reconstructed. Thus, the problem of background noise reproduction can be addressed by smoothing the energy contour without the disadvantage of a perceptible activation/deactivation of the energy contour smoothing operations.

REFERENCES:
patent: 4630305 (1986-12-01), Borth et al.
patent: 4969192 (1990-11-01), Chen et al.
patent: 5008941 (1991-04-01), Sejnoha
patent: 5012519 (1991-04-01), Adlersberg et al.
patent: 5148489 (1992-09-01), Erell et al.
patent: 5179626 (1993-01-01), Thomson
patent: 5233660 (1993-08-01), Chen
patent: 5615298 (1997-03-01), Chen
patent: 0786760 (1997-07-01), None
patent: 0843301 (1998-05-01), None
patent: 9634382 (1996-10-01), None
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98, Seattle, WA, vol. 1., May 1998, pp. 365-368, “A Voice Activity Detector Employing Soft Decision Based Noise Spectrum Adaptation”, J. Sohn et al., XP-002085126.
IEEE 1995, Ericsson Radio Systems AB, Stockholm Sweden “Improvements of Background Sound Coding in Linear Predictive Speech Coders”, Torbjörn Wigren et al., pp. 25-28.

Affiliated with

Johansson Ingemar

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Svedberg Jonas

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Uvliden Anders

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Jenkens & Gilchrist P.C.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Opsasnick Michael N.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Telefonaktiebolaget L M Ericsson

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Tsang Fan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech coding with improved background noise reproduction does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech coding with improved background noise reproduction, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech coding with improved background noise reproduction will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2495488

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure