Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Utility Patent
1998-01-08
2001-01-02
Zele, Krista (Department: 2748)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
Utility Patent
active
06169970
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to speech coding systems and more specifically to a reduction of bandwidth requirements in analysis-by-synthesis speech coding systems.
BACKGROUND OF THE INVENTION
Speech coding systems function to provide codeword representations of speech signals for communication over a channel or network to one or more system receivers. Each system receiver reconstructs speech signals from received codewords. The amount of codeword information communicated by a system in a given time period defines system bandwidth and affects the quality of speech reproduced by system receivers.
Designers of speech coding systems often seek to provide high quality speech reproduction capability using as little bandwidth as possible. However, requirements for high quality speech and low bandwidth may conflict and therefore present engineering trade-offs in a design process. This notwithstanding, speech coding techniques have been developed which provide acceptable speech quality at reduced channel bandwidths. Among these are analysis-by-synthesis speech coding techniques.
With analysis-by-synthesis speech coding techniques, speech signals are coded through a waveform matching procedure. A candidate speech signal is synthesized from one or more parameters for comparison to an original speech signal to be encoded. By varying parameters, different synthesized candidate speech signals may be determined. The parameters of the closest matching candidate speech signal may then be used to represent the original speech signal.
Many analysis-by-synthesis coders, e.g., most code-excited linear prediction (CELP) coders, employ a long-term predictor (LTP) to model long-term correlations in speech signals. (The term “speech signals” means actual speech or any of the excitation signals present in analysis-by-synthesis coders.) As a general matter, such correlations allow a past speech signal to serve as an approximation of a current speech signal. LTPs work to compare several past speech signals (which have already been coded) to a current (original) speech signal. By such comparisons, the LTP determines which past signal most closely matches the original signal. A past speech signal is identifiable by a delay which indicates how far in the past (from current time) the signal is found. A coder employing an LTP subtracts a scaled version of the closest matching past speech signal (i.e., the best approximation) from the current speech signal to yield a signal (sometimes referred to as a residual or excitation with reduced long-term correlation. This signal is then coded, typically with a fixed stochastic codebook (FSCB). The FSCB index and LTP delay, among other things, are transmitted to a CELP decoder which can recover an estimate of the original speech from these parameters.
By modeling long-term correlations of speech, the quality of reconstructed speech at a decoder may be enhanced. This enhancement, however, is not achieved without a significant increase in bandwidth. For example, in order to model long-term correlations in speech, conventional CELP coders may transmit 8-bit delay information every 5 or 7.5 ms (referred to as a subframe). Such time-varying delay parameters require, e.g., between one and two additional kilobits (kb) per second of bandwidth. Because variations in LTP delay may not be predictable over time (i.e., a sequence of LTP delay values may be stochastic in nature), it may prove difficult to reduce the additional bandwidth requirement through the coding of delay parameters.
One approach to reducing the extra bandwidth requirements of analysis-by-synthesis coders employing an LTP might be to transmit LTP delay values less often and determine intermediate LTP delay values by interpolation. However, interpolation may lead to suboptimal delay values being used by the LTP in individual subframes of the speech signal. For example, if the delay is suboptimal, then the LTP will map past speech signals into the present in a suboptimal fashion. As a result, any remaining excitation signal will be larger than it might otherwise be. The FSCB must then work to undo the effects of this suboptimal time-shift rather than perform its normal function of refining waveform shape. Without such refinement, significant audible distortion may result.
SUMMARY OF THE INVENTION
The present invention provides a method and apparatus for reducing bandwidth requirements in analysis-by-synthesis speech coding systems. The present invention provides multiple trial original signals based upon an actual original signal to be encoded. These trial original signals are constrained to be audibly similar to the actual original signal and are used in place of or supplement the use of the actual original in coding. The original signal, and hence the trial original signals, may take the form of actual speech signals or any of the excitation signals present in analysis-by-synthesis coders. The present invention affords generalized analysis-by-synthesis coding by allowing for the variation of original speech signals to reduce coding error and bit rate. The invention is applicable to, among other things, networks for communicating speech information, such as, for example, cellular and conventional telephone networks.
In an illustrative embodiment of the present invention, trial original signals are used in a coding and synthesis process to yield reconstructed original signals. Error signals are formed between the trial original signals and the reconstructed signals. The trial original signal which is determined to yield the minimum error is used as the basis for coding and communication to a receiver. By reducing error in this fashion, a coding process may be modified such that required system bandwidth may be reduced.
In a further illustrative embodiment of the present invention for a CELP coder, one or more trial original signals are provided by application of a codebook of time-warps to the actual original signal. In an LTP procedure of the CELP coder, trial original signals are compared with a candidate past speech signal provided by an adaptive codebook. The trial original signal which most closely compares to the candidate is identified. As part of the LTP process, the candidate is subtracted from the identified trial original signal to form a residual. The residual is then coded by application of a fixed stochastic codebook. As a result of using multiple trial original signals in the LTP procedure, the illustrative embodiment of the present invention provides improved mapping of past signals to the present and, as a result, reduced residual error. This reduced residual error affords less frequent transmission of LTP delay information and allows for delay interpolation with little or no degradation in the quality of reconstructed speech.
Another illustrative embodiment of the present invention provides multiple trial original signals through a time-shift technique.
REFERENCES:
patent: 4885790 (1989-12-01), McAulay et al.
patent: 4899385 (1990-02-01), Ketchum et al.
patent: 4910781 (1990-03-01), Ketchum et al.
patent: 5224167 (1993-06-01), Taniguchi et al.
patent: 5267317 (1993-11-01), Kleijn
patent: 5268991 (1993-12-01), Tasaki
B.S. Atal et al., “Stochastic Coding of Speech at Very Low Bit Rates,” Proc. Int. Conf. Comm., Amsterdam, pp. 1610-1613, 1984.
P. Kroon et al., “Pitch Predictors with High Temporal Resolution,” pp. 661-664, 1990.
M. Honda, “Speech Coding Using Waveform Based on LPC Residual Phase Equalization,” pp. 213-216, 1990.
Y. Shoham, “Constrained-Stochastic Excitation Coding of speech at 4.8 KB/S,” Advances in Speech Coding, pp. 339-348, 1991.
T. Taniquichi et al., “Pitch Sharpening For Perceputally,” Proc. Int. Conf. Acoust. Speech and Sign. Process., 1991, pp. 241-244.
C. G. Bell et al., “Reduction of Speech Spectra by Analysis-by-Synthesis Techniques,” J. Acoust. Soc. Am., pp. 1725-1736, 1961.
S. Singhal et al., “Improving Performance of Multi-Pulse LPC Coders at Low Bit Rates,” Proc. Int. Conf. Acoust. speech and Sign. Process., pp. 1.3.1-1.3.4, 1984.
W
Brown Kenneth M.
Lucent Technologies - Inc.
Opsasnick Michael N.
Restaino Thomas A.
Zele Krista
LandOfFree
Generalized analysis-by-synthesis speech coding method and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Generalized analysis-by-synthesis speech coding method and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Generalized analysis-by-synthesis speech coding method and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2512257