Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-02-16
2002-03-26
Korzuch, William (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S275000
Reexamination Certificate
active
06363347
ABSTRACT:
TECHNICAL FIELD
The present invention relates to computer speech recognition, and more particularly, to the editing of dictation produced by a speech recognition
BACKGROUND OF THE INVENTION
A computer speech dictation system that would allow a speaker to efficiently dictate and would allow the dictation to be automatically recognized has been a long-sought goal by developers of computer speech systems. The benefits that would result from such a computer speech recognition (CSR) system are substantial. For example, rather than typing a document into a computer system, a person could simply speak the words of the document, and the CSR system would recognize the words and store the letters of each word as if the words had been typed. Since people generally can speak faster than type, efficiency would be improved. Also, people would no longer need to learn how to type. Computers could also be used in many applications where their use is currently impracticable because a person's hands are occupied with tasks other than typing.
Typical CSR systems have a recognition component and a dictation editing component. The recognition component controls the receiving of the series of utterances from a speaker, recognizing each utterance, and sending a recognized word for each utterance to the dictation editing component. The dictation editing component displays the recognized words and allows a user to correct words that were misrecognized. For example, the dictation editing component would allow a user to replace a word that was misrecognized by either speaking the word again or typing the correct word.
The recognition component typically contains a model of an utterance for each word in its vocabulary. When the recognition component receives a spoken utterance, the recognition component compares that spoken utterance to the modeled utterance of each word in its vocabulary in an attempt to find the modeled utterance that most closely matches the spoken utterance. Typical recognition components calculate a probability that each modeled utterance matches the spoken utterance. Such recognition components send to the dictation editing component a list of the words with the highest probabilities of matching the spoken utterance, referred to as the recognized word list.
The dictation editing component generally selects the word from the recognized word list with the highest probability as the recognized word corresponding to the spoken utterance. The dictation editing component then displays that word. If, however, the displayed word is a misrecognition of the spoken utterance, then the dictation editing component allows the speaker to correct the misrecognized word. When the speaker indicates to correct the misrecognized word, the dictation editing component displays a correction window that contains the words in the recognized word list. In the event that one of the words in the list is the correct word, the speaker can just click on that word to effect the correction. If, however, the correct word is not in the list, the speaker would either speak or type the correct word.
Some CSR systems serve as a dictation facility for word processors. Such a CSR system controls the receiving and recognizing of a spoken utterance and then sends each character corresponding to the recognized word to the word processor. Such configurations have a disadvantage in that when a speaker attempts to correct a word that was previously spoken, the word processor does not have access to the recognized word list and thus cannot display those words to facilitate correction.
SUMMARY OF THE INVENTION
The present invention provides a new and improved computer speech recognition (CSR) system with a recognition component and a dictation editing component. The dictation editing component allows for rapid correction of misrecognized words. The dictation editing component allows a speaker to select the number of alternative words to be displayed in a correction window by resizing the correction window. The dictation editing component displays the words in the correction window in alphabetical order to facilitate locating the correct word. In another aspect of the present invention, the CSR system eliminates the possibility, when a misrecognized word or phrase is respoken, that the respoken utterance will be again recognized as the same misrecognized word or phrase based on analysis of both the previously spoken utterance and the newly spoken utterance. The dictation editing component also allows a speaker to specify the amount of speech that is buffered in a dictation editing component before transferring the recognized words to a word processor. The dictation editing component also uses a word correction metaphor or a phrase correction metaphor which changes editing actions which are normally character-based to be either word-based or phrase-based.
REFERENCES:
patent: 4566065 (1986-01-01), Toth
patent: 4714918 (1987-12-01), Barker et al.
patent: 4761815 (1988-08-01), Hitchcock
patent: 4783803 (1988-11-01), Baker et al.
patent: 4799262 (1989-01-01), Feldman et al.
patent: 4809333 (1989-02-01), Taylor
patent: 4829576 (1989-05-01), Porter
patent: 4831556 (1989-05-01), Oono
patent: 4870686 (1989-09-01), Gerson et al.
patent: 4882757 (1989-11-01), Fisher et al.
patent: 4914704 (1990-04-01), Cole et al.
patent: 4972485 (1990-11-01), Dautrich et al.
patent: 5027406 (1991-06-01), Roberts et al.
patent: 5040127 (1991-08-01), Gerson
patent: 5091947 (1992-02-01), Ariyoshi et al.
patent: 5127055 (1992-06-01), Larkey
patent: 5231670 (1993-07-01), Goldhor
patent: 5329609 (1994-07-01), Sanada et al.
patent: 5367453 (1994-11-01), Capps et al.
patent: 5386494 (1995-01-01), White
patent: 5428707 (1995-06-01), Gould et al.
patent: 5526463 (1996-06-01), Gillick et al.
patent: 5548681 (1996-08-01), Gleaves et al.
patent: 5561747 (1996-10-01), Southgate
patent: 5604897 (1997-02-01), Travis
patent: 5623578 (1997-04-01), Mikkilineni
patent: 5640485 (1997-06-01), Ranta
patent: 5651096 (1997-07-01), Pallakoff et al.
patent: 5712957 (1998-01-01), Waibel et al.
patent: 5829000 (1998-10-01), Huang et al.
patent: 5857099 (1999-01-01), Mitchell et al.
patent: 5899976 (1999-05-01), Rozak
patent: 5950160 (1999-09-01), Rozak
patent: 6157910 (2000-12-01), Ortega
patent: 6195637 (2001-02-01), Ballard et al.
patent: 0 676 501 (1989-11-01), None
patent: 0 573 301 (1993-12-01), None
patent: 0 655 732 (1995-05-01), None
patent: 0 773 532 (1996-11-01), None
Speech to Text: Dictation systems from IBM, Dragon Systems, by Bernard Banct for the Seybold Report on Destop Publishing, vol. 8, No. 7, Mar. 7, 1994.
WYSIWYG Window-Sizing Management, IBM® Technical Disclosure Bulletin, vol. 37 No. 04A, Apr., 1994.
Dictation Preview Window, IBM® Technical Disclosure Bulletin, vol. 37, No. 10, Oct. 1994.
IBM Voice Type for OS/2 Warp, for Voice Type Production Information, 1996.
Interactive Recovery From Speech Recognition Errors In Speech User Interfaces, by Bernhard Suhm, Brad Myers and Alex Waibel, for Interactive Systems Laboratories, Carnegie Mellon University and University of Karlsruhe and Human Computer Interaction Institute, Carnegie Mellon University, Oct. 3, 1996.
Kelly Joseph R.
Korzuch William
Lerner Martin
Microsoft Corporation
LandOfFree
Method and system for displaying a variable number of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for displaying a variable number of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for displaying a variable number of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2866153