Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-09-20
2004-10-05
Knepper, David D (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S207000
Reexamination Certificate
active
06801887
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to a method and apparatus for coding speech signals and, more specifically, to waveform interpolation coding.
BACKGROUND OF THE INVENTION
The rapid growth in digital wireless communication has led to the growing need for low bit-rate speech coders with good speech quality. The current speech coding methods capable of providing speech quality near that of a wire-line network are operated at bit rates above 6 kbps. These bit rates, however, may not be desirable for many wireless applications, such as satellite telephony systems and half bit-rate transmission channels for mobile communication systems. Mobile communication systems set special requirements to a speech coder and, particularly, to its speech quality, bit-rate, complexity and delay. During recent years, the main challenge in the development of speech coders has been to decrease the bit rate while maintaining the wire-line speech quality. As the bit rate decreases, the operation of speech coding algorithms usually becomes more dependent on the characteristics of the input signal. In particular, in a system where a bit-stream is transmitted over a channel, which is exposed to errors, the speech quality can deteriorate significantly. Thus, it is desirable to design a speech coder which is robust enough to avoid channel errors and can recover rapidly from the erroneous speech frames.
During the last decades, many methods have been developed for robust speech coding. One of the most promising low bit-rate speech-coding methods is waveform interpolation (WI) coding. In general, a WI coder extracts a surface from the speech signal in order to describe the development of the pitch-cycle waveform as a function of time. From the extracted surface, the speech signal is further divided into periodic and noise components so that they can be coded separately. For example, in U.S. Pat. No. 5,517,595, Kleijn discloses a method of decomposing noise and periodic signal waveforms for waveform interpolation, wherein a plurality of sets of indexed parameters are generated based on samples of the speech signal, and each set of indexed parameters corresponds to a waveform characterizing the speech signal at a discrete point in time. Parameters are further grouped based on index value to form a set of signals representing a slowly evolving waveform (SEW) and a set of signals representing a rapidly evolving waveform (REW), to be coded separately. In the article entitled “Waveform Interpolation for Speech Coding and Synthesis” (
Speech Coding and Synthesis
, W. B. Kleijn and K. K. Paliwal, Eds., pp. 175-208, Elsevier Science B. V., 1995), Kleijn and Haagen disclose the decomposition of the characteristic waveform and the outline of a WI coding system.
In general, speech signals contain voiced speech periods and unvoiced speech periods. Voiced speech is quasi-periodic and appears as a succession of similar, slowly evolving pitch-cycle waveforms. As such, the pitch-cycle waveform describes the essential characteristics of the speech signal. WI coding exploits this fact by extracting and coding the characteristic waveform in an encoder and then reconstructing the speech signal from the extracted and coded characteristic waveform in a decoder. If the pitch-cycle waveform and a phase function are known for each time instant, then it is possible to reconstruct the original speech signal without distortion. The speech signal can therefore be represented as a two-dimensional surface u(t,&phgr;), where the waveform is displayed along the phase (&phgr;) axis and the evolution of the waveform along the time (t) axis. This description of the voiced speech characteristics is also valid for the unvoiced speech, which consists essentially of non-period signals.
In a WI speech encoder, a low-pass filter is used to filter the two-dimensional surface u(t,&phgr;) along the t axis, resulting in a slowly evolving waveforn (SEW). The filtered-out portion of the speech signal is a rapidly evolving waveform (REW). The SEW signal corresponds mainly to the substantially periodic component of the speech signal, while the REW signal corresponds mainly to the noise component. For improving coding efficiency, the quantization of the SEW and the REW signals is usually carried out in a frequency domain where the magnitudes and the phases are quantized separately. In practice, the first operation of most WI coders is to perform a linear prediction (LP) analysis of the speech signal. In the LP analysis, short-term correlations between speech samples are modeled and removed by filtering. The modeled short-term correlations are used to establish a predicted signal. The error signal between the original signal and the predicted signal is the LP residual signal. Only the residual signal is decomposed in a SEW part and an REW component. The predicted signal is represented by a set of LP coefficients.
A WI encoder can be functionally divided into an outer and an inner layer. The outer layer estimates parameters for a current speech frame, and the inner layer encodes these parameters in order to produce a bit stream for transmission through a communication channel or for storage in a storage medium for later use. As shown in
FIG. 1
, the outer layer determines a set of LP coefficients and extracts a waveform surface in order to describe the development of the pitch-cycle waveform as a function of time. The outer layer also determines the pitch and power of the speech signal. The inner layer decomposes the LP residual speech surface into SEW and REW components and encodes these components separately. The inner layer also quantizes the pitch, the LP coefficients and the power and formats the encoded data into a bit-stream. Likewise, a WI decoder can also be functionally divided into an outer layer and an inner layer, as shown in FIG.
2
. In decoding, the inner layer dequantizes the received bit stream in order to determine the parameters for the current speech frame, and the outer layer subsequently reconstructs the speech signal from the decoded parameters. In the encoder, the SEW and REW signals are down-sampled to a desired sampling rate before quantization. In the decoder, the SEW and REW signals are up-sampled before they are reconstructed into a surface representing the LP residual signal. In the prior art WI coder, as shown in
FIGS. 1 and 2
, the quantization scheme is fixed, regardless of the characteristics of the input signal. This is often true for other types of speech coders, such as Code Excited Linear Prediction (CELP) and sinusoidal coders. This means that the bit allocation in the bit stream is based only on the down-sampling of the SEW and REW signals, but not the relative signal strength between the SEW and the REW components, as a function of time. In particular, in the prior art, the voiced period in the speech signal is emphasized over the unvoiced period, and the quantization accuracy of the SEW waveform is emphasized over the update rate. Typically, the SEW waveform is down-sampled to 50 Hz and quantized using a vector quantization scheme, while the REW waveform is down-sampled to 200 Hz, and the magnitude spectrum of the REW waveform is quantized using only a few shapes. While this bit allocation scheme may be appropriate for the voiced period when the SEW component is dominant, it is not an efficient use of bits in the unvoiced period when the REW is dominant, especially at low bit rates.
It is advantageous and desirable to provide a method and apparatus for waveform interpolation coding with a different bit allocation scheme for more efficient use of bits in low bit-rate speech coding.
SUMMARY OF THE INVENTION
The primary objective of the present invention is to improve the efficiency in low-bit rate speech coding, especially in the unvoiced part of a speech signal where the random or noise_component, or equivalently, the rapidly evolving waveform becomes dominant. Accordingly, the first aspect of the present invention is a method of waveform interpolation speech coding for efficiently analyzi
Heikkinen Ari
Nurminen Jani
Tammi Mikko
Knepper David D
Nokia Mobile Phones Ltd.
Ware Fressola Van Der Sluys & Adolphson LLP
LandOfFree
Speech coding exploiting the power ratio of different speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech coding exploiting the power ratio of different speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech coding exploiting the power ratio of different speech... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3281140