Speech encoder using warping in long term preprocessing

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S265000, C704S267000, C704S211000

Reexamination Certificate

active

06449590

ABSTRACT:

BACKGROUND
1. Technical Field
The present invention relates generally to speech encoding and decoding in voice communication systems; and, more particularly, it relates to various techniques used with code-excited linear prediction coding to obtain high quality speech reproduction through a limited bit rate communication channel.
2. Related Art
Signal modeling and parameter estimation play significant roles in communicating voice information with limited bandwidth constraints. To model basic speech sounds, speech signals are sampled as a discrete waveform to be digitally processed. In one type of signal coding technique called LPC (linear predictive coding), the signal value at any particular time index is modeled as a linear function of previous values. A subsequent signal is thus linearly predictable according to an earlier value. As a result, efficient signal representations can be determined by estimating and applying certain prediction parameters to represent the signal.
Applying LPC techniques, a conventional source encoder operates on speech signals to extract modeling and parameter information for communication to a conventional source decoder via a communication channel. Once received, the decoder attempts to reconstruct a counterpart signal for playback that sounds to a human ear like the original speech.
A certain amount of communication channel bandwidth is required to communicate the modeling and parameter information to the decoder. In embodiments, for example where the channel bandwidth is shared and real-time reconstruction is necessary, a reduction in the required bandwidth proves beneficial. However, using conventional modeling techniques, the quality requirements in the reproduced speech limit the reduction of such bandwidth below certain levels.
In conventional coding systems employing long term preprocessing, a modified residual is produced as a new reference for current excitation. The goal is to produce a modified residual that better matches a coded pitch contour (or delay contour) than the original residual so that the LTP gain is higher. This is attempted in conventional systems by individually shifting the pitch pulses to match the pitch contour, requiring reliable endpoint detection of a segment to be shifted to maintain signal continuity. Using such an open loop approach with pulse shifting results in quality problems in speech reproduction.
Additionally, in using such and other conventional approaches, the amount of pitch lag information that must be transmitted is relatively large in view of the limitations often placed on the channel bit rate. For example, 8 bits might be required to encode pitch lag for a first subframe (of 5 ms duration) followed perhaps by 5 bits for pitch lag changes in a second subframe, resulting in a relatively large amount of bandwidth allocation, e.g., 1.3 kbps (kilobits per second), just for the pitch lag information.
Further limitations and disadvantages of conventional systems will become apparent to one of skill in the art after reviewing the remainder of the present application with reference to the drawings.
SUMMARY OF THE INVENTION
Various aspects of the present invention can be found in an embodiment of a speech encoder that uses long term preprocessing of a speech signal wherein the speech signal has a previous pitch lag and a current pitch lag. Therein, the speech encoder comprises an adaptive codebook and an encoder processing circuit coupled to the adaptive codebook. Using estimates of the previous pitch lag and the current pitch lag, the encoder processing circuit generates a pitch lag contour. The encoder processing circuit continuously warps the speech signal to the pitch lag contour.
Many possible variations and further aspects of such a speech encoder are possible. For example, the speech signal may comprise either a weighted speech signal or a residual signal. The pitch lag contour may comprise a linear segment bounded by the estimates of the previous pitch lag and the current pitch lag, and continuous warping may involve warping the speech signal from a first time region to a second time region. Additionally, for example, the encoder processing circuit may search for a best local delay using linear time weighting, and/or perform the estimation of the current pitch lag.
Further aspects of the present invention may be found in an alternate embodiment of a speech encoder that uses long term preprocessing of a speech signal having a pitch lag. As before, the speech encoder comprises an adaptive codebook and an encoder processing circuit coupled thereto. The encoder processing circuit estimates the pitch lag, and, based on such estimate, applies continuous warping of the speech signal.
Other variations and further aspects such as those mentioned previously also apply to this embodiment. For example, the speech signal might comprise a weighted speech signal or a residual signal. The encoder processing circuit may search for a best local delay using linear time weighting, or conduct continuous warping by translating the speech signal from a first time region to a second time region.


REFERENCES:
patent: 5657420 (1997-08-01), Jacobs et al.
patent: 5664054 (1997-09-01), Su
patent: 5734789 (1998-03-01), Swaminathan et al.
patent: 5778334 (1998-07-01), Ozawa et al.
patent: 5778338 (1998-07-01), Jacobs et al.
patent: 5781880 (1998-07-01), Su
patent: 5974375 (1999-10-01), Aoyagi et al.
patent: 6006177 (1999-12-01), Funaki
patent: 6052661 (2000-04-01), Yamaura et al.
patent: 6067518 (2000-05-01), Morii
patent: 6073092 (2000-06-01), Kwon
patent: 6104992 (2000-08-01), Gao et al.
W. Bastiaan Kleijn, Ravi P. Ramachandran, and Peter Kroon, IEEE publication, Generalized Analysis-By-Synthesis Coding and Its Application To Pitch Prediction, 1992, pp. I-337-I-340.
W. Bastiaan Kleijn, Ravi P. Ramachandran and Peter Kroon, “Interpolation of the Pitch-Predictor Parameters in Analysis-by-Synthesis Speech Coders”,IEEE Transaction on Speech and Audio Processing, vol. 2, No. 1, Part I, Jan. 1994, pp. 42-54.
Jean Rouat, Yong Chun Liu and Daniel Morissette, “A Pitch Determination and voiced/unvoiced decision algorithm for noisy speech”,Speech Communication, 21, 1997, pp. 191-207.
W. Bastiaan Kleijn and Peter Kroon, “The RCELP Speech-Coding Algorithm,” vol. 5, No. 5, Sep.-Oct. 1994, pp. 39/573—47/581.
C. Laflamme, J-P. Adoul, H.Y. Su, and S. Morissette, “On Reducing Computational Complexity of Codebook Search in CELP Coder Through the Use of Algebraic Codes,” 1990, pp. 177-180.
Chih-Chung Kuo, Fu-Rong Jean, and Hsiao-Chuan Wang, “Speech Classification Embedded in Adaptive Codebook Search for Low Bit-Rate CELP Coding,” IEEE Transactions on Speech and Audio Processing, vol. 3, No. 1, Jan. 1995, pp. 1-5.
Erdal Paksoy, Alan McCree, and Vish Viswanathan, “A Variable-Rate Multimodal Speech Coder with Gain-Matched Analysis-By-Synthesis,” 1997, pp. 751-754.
Gerhard Schroeder, “International Telecommunication Union Telecommunications Standardization Sector, ” Jun. 1995, pp. i-iv, 1-42.
“Digital Cellular Telecommunications System; Comfort Noise Aspects for Enhanced Full Rate (EFR) Speech Traffic Channels (GSM 06.62),” May 1996, pp. 1-16.
W. B. Kleinjn and K.K. Paliwal (Editors), Speech Coding and Synthesis, Elsevier Science B.V.; Kroon and W.B. Kleijn (Authors), Chapter 3: “Linear-Prediction Based on Analysis-by-Synthesis Coding”, 1995, pp. 81-113.
W. B. Kleijn and K.K. Paliwal (Editors), Speech Coding and Synthesis, Elsevier Science B.V.; A. Das, E. Paskoy and A. Gersho (Authors), Chapter 7: “Multimode and Variable-Rate Coding of Speech,” 1995, pp. 257-288.
B.S. Atal, V. Cuperman, and A. Gersho (Editors), Speech and Audio Coding for Wireless and Network Applications, Kluwer Academic Publishers; T. Taniguchi and Y. Ohta (Authors), Chapter 27: “Structured Stochastic Codebook and Codebook Adaptation for CELP,”1993, pp. 217-224.
B.S. Atal, V. Cuperman, and A. Gersho (Editors), Advances in Speech Coding , Kluwer Academic Publishers; I. A. Gerson and M.A. Jasiuk (Authors), Chapters 7: “Vector Sum Excited Linear Prediction (VSELP),” 1991, pp. 69-7

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech encoder using warping in long term preprocessing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech encoder using warping in long term preprocessing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech encoder using warping in long term preprocessing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2824812

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.