Method and system for automatically determining phonetic...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S243000, C704S257000, C704S251000, C704S231000

Reexamination Certificate

active

06233553

ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to speech recognition and speech synthesis systems. More particularly, the invention relates to pronunciation generation.
Computer-implemented and automated speech technology today involves a confluence of many areas of expertise, ranging from linguistics and psycho-acoustics, to digital signal processing and computer science. The traditionally separate problems of text-to-speech (TTS) synthesis and automatic speech recognition (ASR) actually present many opportunities to share technology. Traditionally, however, speech recognition and speech synthesis have been addressed as entirely separate disciplines, relying very little on the benefits that cross-pollination could have on both disciplines.
We have discovered techniques, described in this document, for combining speech recognition and speech synthesis technologies to the mutual advantage of both disciplines in generating pronunciation dictionaries. Having a good pronunciation dictionary is key to both text-to-speech and automatic speech recognition applications. In the case of text-to-speech, the dictionary serves as the source of pronunciation for words entered by graphemic or spelled input. In automatic speech recognition applications the dictionary serves as the lexicon of words that are known by the system. When training the speech recognition system, this lexicon identifies how each word is phonetically spelled, so that the speech models may be properly trained for each of the words.
In both speech synthesis and speech recognition applications, the quality and performance of the application may be highly dependent on the accuracy of the pronunciation dictionary. Typically it is expensive and time consuming to develop a good pronunciation dictionary, because the only way to obtain accurate data has heretofore been through use of professional linguists, preferably a single one to guarantee consistency. The linguist painstakingly steps through each word and provides its phonetic transcription.
Phonetic pronunciation dictionaries are available for most of the major languages, although these dictionaries typically have a limited word coverage and do not adequately handle proper names, unusual and compound nouns, or foreign words. Publicly available dictionaries likewise fall short when used to obtain pronunciations for a dialect different than the one for which the system was trained or intended.
Currently available dictionaries also rarely match all of the requirements of a given system. Some systems (such as text-to-speech systems) need high accuracy; whereas other systems (such as some automatic speech recognition systems) can tolerate lower accuracy, but may require multiple valid pronunciations for each word. In general, the diversity in system requirements compounds the problem. Because there is no “one size fits all” pronunciation dictionary, the construction of good, application-specific dictionaries remains expensive.
The present invention provides a system and method for automatically generating phonetic transcriptions, with little or no human involvement, depending on the desired accuracy of the dictionary. The invention provides a tool by which the user can specify a confidence level and the system automatically stores in the dictionary all generated pronunciations that fulfill the desired confidence level. Unlike other phonetic transcription tools, the invention requires no specific linguistic or phonetic knowledge to produce a pronunciation dictionary. The system can generate multiple pronunciations at different confidence levels, as needed, based on the requirements of the speech system being developed.
One powerful advantage of the system and method of the invention is that it uses multiple sources of information to synergistically achieve superior results. Integrating information from various dimensions gives a result that is greater than the sum of its parts. Moreover, different words may be handled by different methods, resulting in a superior final product. A non-exhaustive list of information sources applicable to the present invention includes: expert systems based on letter-to-sound rules, on-line dictionaries, morph dictionaries with morph combining rules, trainable learning subsystems, dialect transformation rules, and output from automatic speech recognition, from an operator's voice or from other audio sources.
In accordance with one aspect of the invention, a trainable learning sub-system is included that can adapt or improve as new pronunciation information is available. The trainable learning sub-system will adapt to a speaker, for example, making it easy to adapt a lexicon to a new dialect.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings.


REFERENCES:
patent: 5268990 (1993-12-01), Cohen et al.
patent: 5329608 (1994-07-01), Bocchieri et al.
patent: 5581655 (1996-12-01), Cohen et al.
patent: 5606644 (1997-02-01), Chou et al.
patent: 5799276 (1998-08-01), Komissarchik et al.
patent: 5832430 (1998-12-01), Lleida et al.
patent: 5855000 (1998-12-01), Walbel et al.
patent: 6009392 (1999-12-01), Kanevsky et al.
Nakamura et al., (“A high-Speed Morpheme-Extraction System using Dictionary Database”, Proceedings Fourth International Conference on Data Engineering Feb. 1-5, 1988, pp. 488-495).*
Rigazio et al., (“Multilevel discriminative training for spelled word recognition”, Proceedings of the 1998 IEEE International Conference on Acoustics, speech and Signal Processing, May 12-15, 1998, vol. 1, pp. 489-492, 1998).*
Soong et al., (A tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition, ICASSP-91, vol. 1, pp. 705-708, Apr. 1991).*
Andersen et al., (“Comparison of two tree-structured approaches for grapheme-to-phoneme conversion”, ICSLP 96, vol. 3, pp. 1700-1703, Oct. 3-6, 1996).*
Lazarides et al., (“Improving decision trees for acoustic modeling”, ICSLP 96, vol. 2, pp. 1053-1056, Oct. 3-6, 1996).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for automatically determining phonetic... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for automatically determining phonetic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for automatically determining phonetic... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2568144

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.