Process for the multilingual use of a hidden markov sound...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S243000, C704S244000, C704S245000, C704S001000, C704S002000, C704S009000, C704S010000

Reexamination Certificate

active

06212500

ABSTRACT:

BACKGROUND OF THE INVENTION
Field of the Invention
The invention is directed to hidden Markov models for speech recognition systems, of a type suitable for use for a number of languages in that the acoustic and phonetic similarities between the different languages are exploited.
Description of the Prior Art
A great problem in speech recognition is comprised therein that new acoustic phonetic models must be trained for every language in which the speech recognition technology is to be introduced in order to be able to implement a national match. Hidden Markov models for modelling the language-specific sounds are usually employed in standard speech recognition systems. Acoustic word models that are recognized during a search process in the speech recognition procedure are subsequently compiled from these statistically modelled sound models. Very extensive speech data banks are required for training these sound models, the collection and editing of these representing an extremely cost-intensive and time-consuming process. Disadvantages thereby arise when transferring a speech recognition technology from one language into another language since the production of a new speech data bank means, on the one hand, that the product becomes more expensive and, one the other hand, causes a time delay in the market introduction.
Language-specific models are exclusively employed in standard purchasable speech recognition systems. Extensive speech data banks are collected and edited for transferring these systems into a new language. Subsequently, the sound models for the new language are re-trained from square one with these collected voice data.
In order to reduce the outlay and the time delay when transferring speech recognition systems into different languages, an examination should thus be made to see whether individual sound models are suitable for employment in different languages. The article by Dalsgaard et al. entitled “Identification of Mono- and Poly-phonemes using acoustic-phonetic Features derived by a self-organising Neural Network,” in Proc. ICSLP '92, pages 547-550 discloses approaches for producing multilingual sound models and utilizing these in the speech recognition in the respective languages. The terms ‘polyphoneme’ and ‘monophoneme’ are also introduced there. The term polyphonemes means sounds whose sound formation properties are similar enough over several languages in order to be equated.
Monophonemes indicate sounds that exhibit language-specific properties. So that new speech data banks do not have to be trained every time for such development work and investigations, these are already available as a standard as described in “Data-driven Identification of Poly- and Mono-phonemes for four European Languages,” Andersen et al., Proc. EUROSPEECH '93, pages 759-762 (1993); “ASCII Phonetic Symbols for the World's Languages: Worldbet.” Hieronymus, preprint, (1993); and “The OGI Multi-language Telephone Speech Corpus”, Cole et al., Proc. ICSLP '92, pages 895-898,(1992) The aforementioned article by Andersen et al. from Proc. EUROSPEECH '93discloses the employment of particular phonemes and hidden Markov sound models of these phonemes for a multilinguistic speech recognition.
SUMMARY OF THE INVENTION
An object of the present is to provide a method for multilingual employment of a hidden Markov sound model with which the transfer outlay of speech recognition systems into a different language is minimized in that the parameters in a multilingual speech recognition system are reduced.
The above object is achieved in accordance with the principles of the present invention in a first embodiment of a method for modelling a sound in at least two languages wherein a first feature vector for a first spoken sound in a first language is identified and a second feature vector for a second spoken sound, comparable to the first spoken sound, is identified in a second language. A first hidden Markov sound model is selected from a library of standard Markov sound models, which most closely models the first feature vector, and a second hidden Markov sound model is selected from the library which most closely models the second feature vector. A predetermined criterion is employed to select one of the first or second hidden Markov sound models as the better of these two models for modelling both of the first and second feature vectors. Both of the first and second spoken words in the respective first and second languages are then modeled using the selected one of the first or second hidden Markov sound models.
In a second embodiment of the inventive method, a first hidden Markov sound model for a first sound in a first language is identified and a second hidden Markov sound model for a comparable spoken second sound in a second language are identified, and a polyphoneme model is formed by combining the respective standard probability distributions employed for the modelling of the first and second hidden Markov sound models so as to form a new standard probability distribution. This new standard probability distribution is formed up to a defined distance threshold, which indicates the maximum distance (window) within which the aforementioned two standard probability distributions should be combined. Only the new standard probability distribution within this window is then used to characterize the polyphoneme model. The thus-characterized polyphoneme model is then employed for modelling both the first sound and the second sound in the respective first and second languages.
A particular advantage of the inventive method is that a statistical similarity criterion is specified that allows that sound model whose characteristic best describes all feature vectors of the respective sound that are available to be selected from a given number of different sound models for similar sounds in different languages.
The logarithmic probability distance between the respective hidden Markov models and each and every feature vector is especially advantageously determined as criterion for the selection of the best hidden Markov model for different sound feature vectors. As a result thereof, a criterion is made available that reflects experimental findings with respect to the similarity of individual sound models and their recognition rates.
The arithmetic mean of the logarithmic probability distances between each hidden Markov model and the respective feature vectors is especially advantageously formed as criterion for the description of an optimally representative hidden Markov sound model in the invention since a symmetrical distance value is thereby obtained.


REFERENCES:
patent: 5758023 (1998-05-01), Bordeaux
patent: 5805771 (1998-09-01), Muthusumy et al.
patent: 5835888 (1998-11-01), Kanevsky et al.
patent: 5867811 (1999-02-01), O'Donoghue
patent: 5991721 (1999-11-01), Asano et al.
Babin et al, Incorporation of time barying AR modeling in speech recognition . . . , , IEEE Proceedings, pp 289-292, Mar. 1991.*
Sze et al., Branch and bound algorithm for Bayes classifier, IEEE Pattern Recognition, pp 705-709, Mar. 1991.*
“Training Data Clustering for Improved Speech Recognition”, Sankar et al., in Proc. EUROSPEECH '95, pp. 503-506, Madrid, 1995.
“Identification of Mono-and Poly-phonemes Using Acoustic-phonetic Features Derived by a Self-organising Neural Network”, Dalsgaard et al., in Proc. ICSLP '92, pp. 547-550, Banff, 1992.
“Methods for Improved Speech Recognition Over the Telephone Lines”, Hauenstein et al., in Proc. ICASSP '95, pp. 425-428, Detroit, 1995.
“ASCII Phonetic Symbols for the World's Languages: Worldbet.”, Hieronymus, preprint, 1993.
“A Course in Phonetics”, Ladefoged, Harcourt Brace Jovanovich, San Diego 1993.
Data-driven Identification of Poly-and Mono-phonemes for Four European Languages:, Andersen et al., Proc. EUROSPEECH '93, pp. 759-762, Berlin, 1993.
“The OGI Multi-language Telephone Speech Corpus”, Muthusamy et al., in Proc. IC-SLP '92, pp. 895-898, banff, 1992.
“An Evaluation of Cross-Language Adaption for Rapid HMM Developme

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Process for the multilingual use of a hidden markov sound... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Process for the multilingual use of a hidden markov sound..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Process for the multilingual use of a hidden markov sound... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2480112

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.