Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2001-03-16
2004-08-31
McFadden, Susan (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S270000
Reexamination Certificate
active
06785650
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to automatic speech recognition and, more particularly, relates to hierarchical transcription and display of input speech.
BACKGROUND OF THE INVENTION
Transcription of words based on Automatic Speech Recognition (ASR) is a well known method that helps to improve the communication ability of the hearing impaired. A problem with this approach is that if the error recognition rate is relatively high, the transcription is not efficient for hearing impaired children who are still learning a language, as these children can be easily confused by the wrongly decoded words. An approach that addresses this problem is displaying phonetic output rather than words. This approach is, however, not optimal because reading correctly recognized words is easier and more efficient than reading phonetic output.
The use of ASR to teach hearing impaired people to read is also a well known method. In this approach, a reference text is displayed for a user and the ASR decodes the user speech while he or she reads aloud the text and compares the decoded output with the reference text. One reference that explains this use of ASR for this purpose is “Reading Tutor Using an Automatic Speech,” Technical Disclosure Bulletin, Volume 36, Number 8, 08-93, pp. 287-290, the disclosure of which is hereby incorporated by reference. A problem with this approach is that any errors in speech recognition will make the user think that he or she has misspoken a word, while the error is actually the fault of the program.
Another problem with ASR occurs in noisy environments, such as occurs with a difficult channel like telephone or when speech is ridden with disfluencies. In these situations, a substantial number of errors is likely to occur. Although errors can sometimes be identified by the user because of the context, the resulting confusion and increased difficulty of interpretation may offset the benefits of word-based display. This is especially true when the user is a child who is in the process of learning the language. In this case, virtually no errors should be allowed.
While this problem is particularly problematic for children who are learning to speak properly, high error rates of ASR are also a general problem. As a person dictates into an ASR system, the system will make transcription decisions based on probabilities, and the decisions may be based on low probabilities. If the user does not immediately catch an incorrect transcription, the correct transcription may be hard to determine, even when the context is known.
Thus, what is needed is a way of limiting or solving the problems of a high recognition error rate when using ASR to improve the communication ability or the reading skills of hearing impaired people or both, or when using the ASR for other speech recognition purposes.
SUMMARY OF THE INVENTION
Generally, the present invention provides the ability to present a mixed display of a transcription to a user. The mixed display is preferably organized in a hierarchical fashion. Words, syllables and phones can be placed on the same display by the present invention, and the present invention can select the appropriate symbol transcription based on the parts of speech that meet minimum confidences. Words are displayed if they meet a minimum confidence or else syllables, which make up the word, are displayed. Additionally, if a syllable does not meet a predetermined confidence, then phones, which make up the syllable, may be displayed. A transcription, in one aspect of the present invention, may also be described as a hierarchical transcription, because a unique confidence is derived that accounts for mixed word/syllable/phone data.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.
REFERENCES:
patent: 4882757 (1989-11-01), Fisher et al.
patent: 5737724 (1998-04-01), Atal et al.
patent: 5842163 (1998-11-01), Weintraub
patent: 6487534 (2002-11-01), Thelen et al.
patent: 6502073 (2002-12-01), Guan et al.
patent: 6526380 (2003-02-01), Thelen et al.
patent: 6567778 (2003-05-01), Chao Chang et al.
patent: 0 924 687 (1999-06-01), None
patent: 0 957 470 (1999-11-01), None
Evernmann et al., “Large Vocabulary Decoding and Confidence Estimation Using Word Posterior Probabilities,” Proc. of ICASSP 2000.
Mangu et al., “Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks,” Computer Speech and Lanquage (2000) 14, 373-400.
“Reading Tutor Using an Automatic Speech Recognition,” IBM Technical Disclosure Bulletin, IBM vol. 36, No. 8, 287-289 (Aug. 1993).
Basson Sara H.
Kanevsky Dimitri
Maison Benoit Emmanuel
August, Esq. Casey P.
International Business Machines - Corporation
McFadden Susan
Ryan & Mason & Lewis, LLP
LandOfFree
Hierarchical transcription and display of input speech does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Hierarchical transcription and display of input speech, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hierarchical transcription and display of input speech will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3337714