Speech processing using conditional observable maximum...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S240000, C704S245000, C704S255000

Reexamination Certificate

active

06678658

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to speech processing, and, more particularly, to speech processing using maximum likelihood continuity mapping.
BACKGROUND OF THE INVENTION
While speech recognition systems are commercially available for limited domains, state-of-the-art systems have only about a 60%-65% word recognition rate on casual speech, e.g., telephone conversations, as opposed to speech produced by users who are familiar with and trying to be understood by a speech recognition system. Since speaking rates of 200 words per minute are not uncommon in casual speech, a 65% word recognition accuracy implies approximately 70 errors per minute—an unacceptably high rate for most applications. Furthermore, recognition performance is not improving rapidly. Improvements in word recognition accuracy of a few percent are considered “big” improvements, and recognition rates of the best systems on recorded telephone conversations have been generally stagnant in recent years.
Hidden Markov models (HMMs) are among the most popular tools for performing computer speech recognition (Rabiner & Juang,
An introduction to hidden Markov models, IEEE Acoustics. Speech, and Signal Processing Magazine
(1986). One of the primary reasons that HMMs typically out perform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. HMMs can then deal with intra- and inter-speaker variability despite a limited knowledge of how speech signals vary and despite an often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using (possibly inaccurate) Knowledge of speech, recognition performance decreases.
Nonetheless, many of the assumptions underlying HMM's are known to be inaccurate, and improving on these inaccurate assumptions within the HMM framework can be computationally expensive. Thus, various researchers have argued that, by using probabilistic models that more accurately embody the process of speech production, more accurate speech recognition should be achieved.
A prior art technique called Maximum Likelihood Continuity Mapping (MALCOM) provides a means of learning a more physiologically realistic stochastic model of speech as well as providing a method for speech processing once the stochastic model has been learned. See U.S. Pat. No. 6,052,662, issued Apr. 18, 2000, and incorporated herein by reference. The mapping learned by MALCOM is embodied in a continuity map, which is a continuous, multidimensional space over which probability density functions are positioned—where the probability density functions give the probability of a position in the space conditioned on an acoustic signal. The assumptions underlying MALCOM are well-founded. In fact, the main (and surprisingly powerful) assumption used by MALCOM is that articulator motions produced by muscle contractions have little energy above some low cut-off frequency, which is easily verified simply by calculating spectra of articulator paths.
MALCOM does not work directly on speech acoustics, but instead works on sequences of categorical data values, such as sequences of letters, words, or phonemes. The fact that MALCOM works on sequences of categorical data values is not a problem for processing digitized speech (a sequence of continuous valued amplitudes) because it is a simple matter to convert recorded speech to sequences of symbols using, e.g., Vector Quantization (VQ) (Gray, R., Vector
Quantization, IEEE Acoustics, Speech, and Signal Processing Magazine
, pp. 4-29 (1984). Unfortunately, MALCOM works with only one sequence at a time. This is a disadvantage when trying to apply MALCOM to problems such as speech recognition, in which relationships between tho time series (e.g. recorded speech sounds and phonetic labels) must be learned. In accordance with the present invention, MALCOM is modified to work with more than one observable sequence at a time to provide Conditional-Observable Maximum Likelihood Continuity Mapping (CO-MALCOM).
Various objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
SUMMARY OF THE INVENTION
The present invention, as embodied and broadly described herein, is directed to a computer implemented method for the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech transcription symbols. A new sequence of speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.


REFERENCES:
patent: 5682501 (1997-10-01), Sharman
patent: 5719996 (1998-02-01), Chang et al.
patent: 5865626 (1999-02-01), Beattie et al.
patent: 6029124 (2000-02-01), Gillick et al.
patent: 6038388 (2000-03-01), Hogden et al.
patent: 6052662 (2000-04-01), Hogden
patent: 6076057 (2000-06-01), Narayanan et al.
patent: 6092044 (2000-07-01), Baker et al.
patent: 6151574 (2000-11-01), Lee et al.
patent: 6151575 (2000-11-01), Newman et al.
patent: 6212498 (2001-04-01), Sherwood et al.
patent: 6256607 (2001-07-01), Digalakis et al.
patent: 6260013 (2001-07-01), Sejnoha
patent: 6263309 (2001-07-01), Nguyen et al.
patent: 6424943 (2002-07-01), Sherwood et al.
patent: 6466908 (2002-10-01), Baggenstoss
Juergen Schroeter and Man Mohan Sondhi, “Techniques for Estimating Vocal-Tract Shapes from the Speech Signal,” IEEE Transactions on Speech and Audio Processing, vol. 2, No. 1, Part II, Jan. 1994, pp. 133-150.
R. C. Rose, J. Schroeter, and M. M. Sondhi, “The Potential Role of Speech Production Models in Automatic Speech Recognition,” J. Acoustical Society of America, vol. 99, No. 3, Mar. 1996, pp. 1609-1709.
Joseph S. Perkell, Marc H. Cohen, Mario A. Svirsky, Melanie L. Matthies, Inaki Garabieta, and Michael T. T. Jackson, “Electromagnetic Midsagittal Articulometer Systems for Transducing Speech Articulatory Movements,” J. Acoustical Society of America, vol. 92, No. 6, Dec. 1992, pp. 3078-3096.
Sharlene A. Liu, “Landmark Detection for Distinctive Featured-Based Speech Recognition,” J. Acoustical Society of America, vol. 100, No. 5, Nov. 1996, pp. 3417-3430.
John Hogden, Anders Lofqvist, Vince Gracco, Igor Zlokarnik, Philip Rubin, and Elliot Saltzman, “Accurate Recovery of Articulator Positions from Acoustics: New conclusions Based on Human Data,” J. Acoustical Society of America, vol. 100, No. 3, Sep. 1996, pp. 1819-1834.
Li Deng and Don X. Sun, “A Statistical Approach to Automatic Speech Recognition using the Atomic Speech Units Constructed From Overlapping Articulatory Features,” J. Acoustical Society of America, vol. 95, No. 5, Part 1, May 1994, pp. 2702-2719.
Robert M. Gray, “Vector Quantization,” IEEE ASSP Magazine, Apr. 1984, pp. 4-29.
John Hogden, “A Maximum Likelihood Approach To Estimating Articulator Positions From Speech Acoustics,” LA-UR-96-3518, pp. 1-24.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech processing using conditional observable maximum... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech processing using conditional observable maximum..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech processing using conditional observable maximum... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3190600

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.