Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-06-24
2002-04-16
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S275000
Reexamination Certificate
active
06374214
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to the field of speech recognition computer applications and more specifically to an improved system for excluding or ruling out incorrect phrases when correcting words by re-dictation in a speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which acoustic signals, received via a microphone, are “recognized” and converted into words by a computer. These recognized words may then be used in a variety of computer software applications. For example, speech recognition may be used to input data, prepare documents and control the operation of software applications. Speech recognition systems programmed or trained to the diction and inflection of a single person can successfully recognize the vast majority of words spoken by that person.
When it is to be used by a large number of speakers, however, it is very difficult for the speech recognition system to accurately recognize all of the spoken words because of the wide variety of pronunciations, accents and divergent speech characteristics of each individual speaker. Due to these variations, the speech recognition system may not recognize some of the speech and some words may be converted erroneously. This may result in spoken words being converted into different words (“hold” recognized as “old”), improperly conjoined (“to the” recognized as “tooth”), or recognized as homonyms (“boar” instead “bore”).
The erroneous words may also result from the speaker using improper dictation techniques. For example, the speaker may be speaking too rapidly or softly, slurring words or be located too far from the microphone. In this case, the recognition software will likely generate a large number of misrecognized words.
Conventional speech recognition systems often include a means for the user to retroactively rectify these errors following the dictation. To simplify the correction process, a correction interface or “window” may be used to provide a list of suggested or alternate text phrases that in some way resemble the misrecognized phrases. This is accomplished by executing a known algorithm, one much like a spell checking program in word-processing applications, to search a system database for phrases with similar characteristics as the incorrectly recognized text phrase. The algorithm outputs a list of one or more alternates from which the user may select the intended phrase. If the intended phrase is not in the alternate list, the user may type in the words. After the intended phrase is selected or keyed in, the algorithm substitutes the corrected phrase for the erroneous phrase.
Users may short cut the correction process by simply selecting the misrecognized text and re-dictating the intended phrase in its place. For example, assume a user spoke the phrase “tooth”, which was subsequently misrecognized by the speech recognition system as “to the”. The user can select the text “to the” with a pointing device or keyboard and re-dictate “tooth”, which the speech system substitutes for “to the”. Commonly, however, the speech recognition system identifies the re-dictated phrase as the misrecognized text (e.g., recognizing the re-dictated “tooth” as “to the”). To the dismay of the user, the original incorrect text may be re-recognized over and over again, requiring the user to input the intended phrase another way.
In one attempt to solve this problem, the selected phrase is removed from the possible phrases recognized during re-dictation. This forces the system to identify the first re-dictated phrase differently from the misrecognized text phrase. Thus, in the above example, the re-dictated phrase will be recognized as something other than “to the”, preferably as “tooth”. However, the re-dictated phrase, although different than the misrecognized phrase, may still be incorrect, for example “tuba”, in which case the user will have to re-dictate the phrase a second time. Now, because the selected phrase is “tuba”, the speech recognition system may again identify the spoken phrase as “to the”. This phenomena is very frustrating to the user and can significantly decrease the recognition accuracy of the speech recognition system.
SUMMARY OF THE INVENTION
The present invention provides an improved method and system for correcting misrecognized text phrases in a computer speech recognition system by preventing prior misrecognized text from reoccurring during re-dictation.
Specifically, the present invention includes a method for correcting a user identified misrecognized text at a first location in a dictated electronic document in a computer system for speech recognition. The method includes the steps: receiving a user input in the form of a spoken utterance corresponding to an intended phrase; processing the user input to identify a plurality of alternate text selections, which are determined statistically as the most likely text recognitions corresponding to the spoken utterance; and excluding the misrecognized text from the plurality of alternate text selections and replacing the misrecognized text with a replacement text selected from the remaining alternate text selections and which is the most likely text recognition.
The misrecognized text may be stored a memory location for excluded text. Then, if the text at the location of the misrecognized text is not the intended phrase, the above steps may be repeated and additional excluded text may be stored in memory. Thus, the user may consecutively substitute the text at this location with replacement text. If the system receives a second user input identifying a second misrecognized phrase at a second location in the document, it clears the memory location containing the excluded text.
Another aspect of the present invention includes permitting the user to re-dictate a phrase previously excluded. In one embodiment, the system identifies a set of acoustic characteristics of the spoken utterance and processes the spoken utterance to determine if it has a similar set of acoustic characteristics to those of the excluded text spoken utterance. The memory location of the excluded text is cleared if the set of acoustic characteristics of the spoken utterance is substantially dissimilar from a set of acoustic characteristics for the excluded text spoken utterance. In an alternative embodiment, the excluded text stored in the memory location can be counted so as to limit the number of excluded text stored in memory to a predetermined value. The system removes the earliest stored excluded text from the memory when the number of excluded text exceeds a predetermined value. Preferably, the predetermined value is two.
Thus, the present invention provides the object and advantage of improving the process of correcting misrecognized text through re-dictation by preventing the re-recognition of text previously excluded. Further, the method and system of the present invention improves the recognition accuracy of the speech recognition engine.
An additional object and advantage of the present invention is that it affords the user the ability to re-dictate text previously changed or corrected, if desired. Previously changed text can be re-recognized when the acoustic characteristics of an additional spoken utterance and the excluded text are significantly dissimilar and/or after a prescribed quantity of excluded text has been stored in memory, i.e., after a prescribed number of re-dictations.
These and other objects, advantages and aspects of the invention will become apparent from the following description. In the description, reference is made to the accompanying drawings which form a part hereof, and in which there is shown a preferred embodiment of the invention. Such embodiment does not necessarily represent the full scope of the invention and reference is made therefore, to the claims herein for interpreting the scope of the invention.
REFERENCES:
patent: 5864805 (1999-01-01), Chen et al.
patent: 5899976 (1999-05-01), Rozak
patent: 6064961 (2000-05-01), Hanson
patent: 6138099 (2000-10-01), Lew
Friedland Steven J.
Smith Maria E.
Dorvil Richemond
International Business Machines Corp.
Senterfitt Akerman
LandOfFree
Method and apparatus for excluding text phrases during... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for excluding text phrases during..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for excluding text phrases during... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2820789