Method and system of Chinese speech pitch extraction

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06721699

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to the field of speech recognition. More specifically, the present invention relates to a method and system for Chinese speech pitch extraction in speech recognition using local optimized dynamic programming pitch path-tracking.
BACKGROUND OF THE INVENTION
Pitch extraction is an essential component in a variety of speech processing systems. Besides providing valuable insights into the nature of the excitation source for speech production, the pitch contour of an utterance is useful for recognizing a speaker, and is required in almost all speech analysis-synthesis systems. Because of the importance of pitch extraction, a wide variety of methods and systems for pitch extraction have been proposed in the speech recognition field.
Basically, the method or system for pitch extraction makes a voiced/unvoiced decision, and during the periods of voiced speech, provides a measurement of the pitch period. Methods and systems for pitch extraction can be roughly divided into the following three broad categories:
1. A group which utilizes principally the time-domain properties of speech signals.
2. A group which utilizes principally the frequency-domain properties of speech signals.
3. A group which utilizes both the time and frequency domain properties of speech signals.
Time-domain pitch extractors operate directly on the speech waveform to estimate the pitch period. For these pitch extractors, the measurements most often made are peak and valley measurements, zero-crossing measurements, and auto-correction measurements. The basic assumption that is made in all these cases is that if a quasi-periodic signal has been suitably processed to minimize the effect of the format structure, then simple time-domain measurements will provide good estimates of the period.
The class of frequency-domain pitch extractors uses the property that if the signal is periodic in the time domain, then the frequency spectrum of the signal will consist of a series of impulses at the fundamental frequency and its harmonics. Thus, simple measurements can be made on the frequency spectrum of the signal to estimate the period of the signal.
The class of hybrid pitch extractors incorporates features of both the time-domain and the frequency-domain approaches to pitch extraction. For example, a hybrid extractor might use frequency-domain techniques to provide a spectrally flattened time waveform, and then use autocorrelation measurements to estimate the pitch period.
Though the above conventional methods and systems for pitch extraction are accurate and reliable, they are only suitable for feature analysis, and not for speech recognition in real time. In addition, due to the differences between most European languages and the Chinese language, there are some special aspects to be taken into account for Chinese speech pitch extraction.
In contrast to most European languages, Mandarin Chinese uses tones for lexical distinction. A tone occurs over the duration of a syllable. There exist five lexical tones that play very important roles in meaning disambiguation. The direct acoustic representative of these tones is the pitch contour variation pattern illustrated in FIG.
1
. The most direct acoustic manifestation of tone is fundamental frequency. Thus, for Chinese speech pitch extraction, the effect of fundamental frequency shall be taken into account.
Paul Boersma's article entitled “Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound,” IFA Proceedings 17, 1993, pp. 97-110, gives a detailed and advanced pitch extraction method based on the processing of fundamental frequency. The main concept of Paul Boersma's article includes the anti-bias auto-correlation and viterbi algorithm (Dynamic Programming) technology, which integrates the voiced/unvoiced decision, pitch candidate estimator, and best path finding into one pass and can efficiently improve the extraction accuracy.
However, the global optimized dynamic programming pitch path-tracking of Paul Boersma is not suitable for practical application for time delay. The time delay of pitch extraction depends on two factors: one is the CPU computation power and another is the algorithm structural issue. As in the algorithm of Paul Boersma, when pitch extraction in current windows (frames) depends on the later windows (frames), whatever the CPU speed is, the system will have structural delay for response. For example, in the algorithm of Paul Boersma, if the speech length is L seconds, then the structural delay time is L seconds. Sometimes it is unacceptable for a real-time speech recognition application. Therefore, it is apparent to one with ordinary skill in the art that an improved method and system is needed.
SUMMARY OF THE INVENTION
The present invention discloses methods and apparatuses for Chinese speech pitch extraction using local optimized dynamic programming pitch path-tracking to meet the low time-delay requirements for a real-time speech recognition application.
In one aspect of the invention, an exemplary method includes:
pre-computing an anti-bias auto-correlation of a Hamming window function; for at least one frame, saving a first candidate as an unvoiced candidate, and detecting other voiced candidates from the anti-bias auto-correlation function; and calculating a cost value for a pitch path according to a voiced/unvoiced intensity function based on the unvoiced and voice candidates, saving a predetermined number of least-cost paths; and outputting at least a portion of contiguous frames with low time delay.
In one particular embodiment, the method includes removing global and local DC components from the speech signal. In another embodiment, the method includes segmenting the speech signal into a plurality of frames, and for each frame, calculating spectrum, power spectrum, and auto-correlation. In a further embodiment, the method includes performing an MFCC extraction.
The present invention includes apparatuses which perform these methods, and machine-readable media which, when executed on a data processing system, cause the system to perform these methods. Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.


REFERENCES:
patent: 6073100 (2000-06-01), Goodridge, Jr.
patent: 6195632 (2001-02-01), Pearson
patent: 6226606 (2001-05-01), Acero et al.
patent: WO 01/35389 (2001-05-01), None
Boersma, Paul; Accurate Short-Term Analysis Of The Fundamental Frequency And The Harmonics-To-Noise Ratio Of A Sampled Sound; Institute Of Phonetic Sciences, University of Amsterdam; Proceedings 17 (1993), pp. 97-110.
Hermes, Dik J.; Measurement of pitch by subharmonic summation; J. Acoust. Soc. Am. 83 (1), Jan. 1988, ©1988 Acoustical Society of America, pp. 257-264.
Liu, PH.D., Sharlene, et al.; The Effect of Fundamental Frequency on Mandarin Speech Recognition; 5thInternational Conference on Spoken Language Processing; 30thNov.-4thDec. 1998, Sydney, Australia, ICSLP '98 Proceedings Th4R9, vol. 6, pp. 2647-2650.
Rabiner, Lawrence R., et al; A Comparative Performance Study of Several Pitch Detection Algorithms; IEEE Transactons On Acoustics, Speech, And Signal Processing, vol. ASSP-24, No. 5, Oct. 1976, pp. 399-418.
Search Report for PCT/US 02/35949, mailed Feb. 6, 2003, 2 pages.
Pearce, David,Enabling New Speech Driven Services for Mobile Devices: An overview of the ETSI standards activities for Distributed Speech Recognition Front-ends, AVIOS 2000: The Speech Applications Conference, May 22-24, 2000, San Jose, CA, USA., <http://www.etsi.org/T-news/Documents/AVIOS DSR paper.pdf>, 12 pages.
Distributed Speech Recognition -Aurora, Oct. 1, 2002, <http://www.etsi.org/technicalactiv/dsr.htm>, pp. 1-3.
Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Frontend feature extraction algorithm; Compression algorithms, ETSI ES 201 108 V1.12 (Apr. 2000)., ETSI Standard, ©European Telecommunications Standards Institute 2000, F-06

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system of Chinese speech pitch extraction does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system of Chinese speech pitch extraction, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system of Chinese speech pitch extraction will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3204858

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.