Multi-stage pitch and mixed voicing estimation for harmonic...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06456965

ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
The present invention relates generally to the field of speech coding, and more particularly to encoding methods for estimating pitch and voicing parameters.
BACKGROUND OF THE INVENTION
Various methods have been developed for digital encoding of speech signals. The encoding enables the speech signal to be stored or transmitted and subsequently decoded, thereby reproducing the original speech signal.
Model-based speech encoding permits the speech signal to be compressed, which reduces the number of bits required to represent the speech signal, thereby reducing data transmission rates. The lower data rates are possible because of the redundancy of speech and by mathematically simulating the human speech-generating system. The vocal tract is simulated by a number of “pipes” of differing diameter, and the excitation is represented by a pulse stream at the vocal chord rate for voiced sound or a random noise source for the unvoiced parts of speech. Reflection coefficients at junctions of the pipes are represented by coefficients obtained from linear prediction coding (LPC) analysis of the speech waveform.
The vocal chord rate, which as stated above, is used to formulate speech models, is related to the periodicity of voiced speed, often referred to as pitch. In an analog time domain plot of a speech signal, the time between the largest magnitude positive or negative peaks during voiced segments is the pitch period. Although speech signals are not perfectly periodic, and in fact, are quasi-periodic or non-stationary signals, an estimated pitch frequency and its reciprocal, the pitch period, attempt to represent the speech signal as truly as possible.
For speech encoding, an estimation of pitch is made, using any one of a number of pitch estimation algorithms. However, none of the existing estimation algorithms have been entirely successfully in providing robust performance over a variety of input speech conditions.
Another parameter of the speech model is a voicing parameter, which indicates which portions of the speech signal are voiced and which are unvoiced. Voicing information may be used during encoding to determine other parameters. Voicing information is also used during decoding, to switch between different synthesis processes for voiced or unvoiced speech. Typically, coding systems operate on frames of the speech signal, where each frame is a segment of the signal and all frames have the same length. One approach to representing voicing information is to provide a binary voiced/unvoiced parameter for each entire frame. Another approach is to divide each frame into frequency bands and to provide a binary parameter for each band. However, neither approach provides a satisfactory model.
SUMMARY OF THE INVENTION
One aspect of the invention is a multi-stage method of estimating the pitch of a speech signal that is to be encoded. In a first stage of the method, a set of candidate pitch values is selected, such as by applying a cost function to the speech signal. In a second stage of the method, a best candidate is selected. Specifically, in the second stage, pitch values calculated for previous speech segments are used to calculate an average pitch value. Then, depending on whether the average pitch value is short or long, one of two different analysis-by-synthesis (ABS) processes is performed. The ABS process is repeated for each candidate, such that for each iteration, a synthesized speech signal is derived from that pitch candidate and compared to the input speech signal. A time domain ABS process is performed if the average pitch is short, whereas a frequency domain ABS process is performed if the average pitch is long. Both ABS processes provide an error value corresponding to each pitch candidate. The pitch candidate having the smallest error is deemed to be the best candidate.
An advantage of the pitch estimation method is,that it is robust, and its ability to perform well is independent of the peculiarities of the input speech signal. In other words, the method overcomes the problem encountered by existing pitch estimation methods, of dealing with a variety of input speech conditions.
Another aspect of the invention is a mixed voicing estimation method for determining the voiced and unvoiced characteristics of an input speech signal that is to be encoded. The method assumes that a pitch for the input speech signal has previously been estimated. The pitch is used to determine the harmonic frequencies of the speech signal. A probability function is used to assign a probability value to each harmonic frequency, with the probability value being the probability that the speech at that frequency is voiced. For transmission efficiency, a cut-off frequency can be calculated. Below the cut-off frequency, the speech signal is assumed to be voiced so that no probability value is required. The voicing estimator provides an improved method of modeling voicing information. It permits a probability function to be efficiently used to differentiate between voiced and unvoiced portions of mixed speech signals.


REFERENCES:
patent: 4561102 (1985-12-01), Prezas
patent: 4653098 (1987-03-01), Nakata et al.
patent: 5003604 (1991-03-01), Okazaki et al.
patent: 5216747 (1993-06-01), Hardwick et al.
patent: 5226108 (1993-07-01), Hardwick et al.
patent: 5574823 (1996-11-01), Hassanein et al.
patent: 5781880 (1998-07-01), Su
Deller, J.R., Proakis, J.G., Hansen, J.H.L., “Discrete-Time Processing of Speech Signals,” 1987.*
Rabiner, L.R., “A Comparative Performance Study of Several Pitch Detection Algorithms,” IEEE Trans. Acoustics, Speech and Signal Processing, vol.ASSP-24, No.5, pp. 399-471, Oct. 1976.*
S. Yeldener, A. M. Kondoz, B. G. Evans “A High Quality Speech Coding Algorithm Suitable for Future Inmarsat Systems”, European Signal Processing Conf. (EUSIPCO-94), Edinburgh, Sep. 1994, p. 407-410.
D. W. Griffin, J. S. Lim “Multi Band Excitation Vocoder”IEE Proc., 1980, vol: 127 pp: 53-60.
R. J. McAulay and T. F. Quatieri “Pitch Estimation and Voicing Detection Based on a Sinusoidal Speech Model” In Proc. of ICASSP, pp. 249-252, 1990.
S. Yeldener, A. M. Kondoz, B. G. Evans “Multi-Band Linear Predictive Speech Coding at very Low Bit Rates”, IEE proc. Vis. Image and Signal Process, vol. 141, No. 5, Oct. 1994, p. 289-296.
L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, and C.A. McGonegal, “A Comparative Performance Study of Several Pitch Detection Algorithms, ” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, pp. 399-417, Oct. 1976.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multi-stage pitch and mixed voicing estimation for harmonic... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multi-stage pitch and mixed voicing estimation for harmonic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multi-stage pitch and mixed voicing estimation for harmonic... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2851697

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.