Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1997-03-28
2001-04-03
Knepper, David D. (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S235000
Reexamination Certificate
active
06212498
ABSTRACT:
BACKGROUND
The invention relates to enrollment in speech recognition.
A speech recognition system analyzes a user's speech to determine what the user said. Most speech recognition systems are frame-based. In a frame-based system, a processor divides a signal descriptive of the speech to be recognized into a series of digital frames, each of which corresponds to a small time increment of the speech.
A speech recognition system may be a “discrete” system that recognizes discrete words or phrases but which requires the user to pause briefly between each discrete word or phrase. Alternatively, a speech recognition system may be a “continuous” system that can recognize spoken words or phrases regardless of whether the user pauses between them. Continuous speech recognition systems typically have a higher incidence of recognition errors in comparison to discrete recognition systems due to complexities of recognizing continuous speech. A more detailed description of continuous speech recognition is provided in U.S. Pat. No. 5,202,952, entitled “LARGE-VOCABULARY CONTINUOUS SPEECH PREFILTERING AND PROCESSING SYSTEM,” which is incorporated by reference.
In general, the processor of a continuous speech recognition system analyzes “utterances” of speech. An utterance includes a variable number of frames and corresponds, for example, to a period of speech followed by a pause of at least a predetermined duration.
The processor determines what the user said by finding acoustic models that best match the digital frames of an utterance, and identifying text that corresponds to those acoustic models. An acoustic model may correspond to a word, phrase or command from a vocabulary. An acoustic model also may represent a sound, or phoneme, that corresponds to a portion of a word. Collectively, the constituent phonemes for a word represent the phonetic spelling of the word. Acoustic models also may represent silence and various types of environmental noise.
The words or phrases corresponding to the best matching acoustic models are referred to as recognition candidates. The processor may produce a single recognition candidate for an utterance, or may produce a list of recognition candidates. Speech recognition techniques are discussed in U.S. Pat. No. 4,805,218, entitled “METHOD FOR SPEECH ANALYSIS AND SPEECH RECOGNITION”, which is incorporated by reference.
An acoustic model generally includes data describing how a corresponding speech unit (e.g., a phoneme) is spoken by a variety of speakers. To increase the accuracy with which an acoustic model represents a particular user's speech, and thereby to decrease the incidence of recognition errors, the speech recognition system may modify the acoustic models to correspond to the particular user's speech. This modification may be based on samples of the user's speech obtained during an initial enrollment session and during use of the system.
Enrollment sessions for previous speech recognition systems typically required a user to read from a list of words or to read specific words in response to prompts. For example, DragonDictate® for Windows®, available from Dragon Systems, Inc. of Newton, Mass., included a quick enrollment session that prompted a new user to speak each word of a small set of words, and then adapted the acoustic models based on the user's speech.
SUMMARY
In general, in one aspect, the invention features enrolling a user in a speech recognition system by analyzing acoustic content of a user utterance and determining, based on the analysis, whether the user utterance matches a portion of an enrollment text. The acoustic content of the user utterance is used to update acoustic models corresponding to the portion of the enrollment text if the user utterance matches a portion of the enrollment text.
Certain implementations of the invention may include one or more of the following features. An enrollment grammar corresponding to the enrollment text may be used to determine whether the user utterance matches a portion of the enrollment text. A rejection grammar may be used to determine whether the user utterance matches a portion of the enrollment text. The rejection grammar may be a phoneme grammar and may model an utterance using a set of phonemes that is smaller than a set of phonemes used by the enrollment grammar.
An enrollment position may be determined within the enrollment text, and a user utterance may be required to match a portion of the enrollment text that begins at the enrollment position. The enrollment text and the enrollment position may be displayed. If the user utterance matches a portion of the enrollment text, the enrollment position is advanced past the matching portion in the enrollment text.
The enrollment text may be selected from a plurality of enrollment texts. Each of the enrollment texts has a corresponding enrollment grammar. The enrollment grammar corresponding to the selected enrollment text is used to determine whether the user utterance matches a portion of the enrollment text.
An enrollment text may be received from a user. An enrollment grammar corresponding to the received enrollment text may be generated for use in determining whether the user utterance matches a portion of the enrollment text.
The user utterance may be ignored if it does not match a portion of the enrollment text.
In general, in another aspect, the invention features enrolling a user into a speech recognition system by displaying an enrollment text and an enrollment position within the enrollment text. When a user utterance is received, a determination is made as to whether a match exists between the user utterance and a portion of the enrollment text beginning at the enrollment position. The enrollment position is updated if a match exists, and the updated enrollment position is displayed.
Certain implementations of the invention may include one or more of the following features. The enrollment position may be displayed using a marker at the enrollment position. The enrollment position may be displayed by highlighting the enrollment text at the enrollment position, and may be displayed using a cursor at the enrollment position.
In general, in another aspect, the invention features computer software, residing on a computer-readable storage medium, comprising instructions for causing a computer to implement the techniques described above.
In general, in another aspect, the invention features a speech recognition system for enrolling a user. The system includes a display for displaying an enrollment text to a user, an input device for receiving speech signals, and a processor. The processor determines a user utterance from a received speech signal, analyzes acoustic content of the user utterance, and determines, based on the acoustic analysis, whether the user utterance matches a portion of an enrollment text. The processor then uses the user utterance to update acoustic models corresponding to the portion of the enrollment text if the user utterance matches a portion of the enrollment text.
Among the advantages of the invention is that by determining and ignoring user utterances that do not match a portion of the enrollment text, the acoustic models of the enrollment program are not incorrectly updated based on the user utterance.
Other features and advantages of the invention will become apparent from the following description and from the claims.
REFERENCES:
patent: 4618984 (1986-10-01), Das et al.
patent: 4759068 (1988-07-01), Bahl et al.
patent: 4776016 (1988-10-01), Hansen
patent: 4783803 (1988-11-01), Baker et al.
patent: 4805218 (1989-02-01), Bamberg et al.
patent: 4805219 (1989-02-01), Baker et al.
patent: 4817156 (1989-03-01), Bahl et al.
patent: 4817158 (1989-03-01), Picheny
patent: 4817161 (1989-03-01), Kaneko
patent: 4819271 (1989-04-01), Bahl et al.
patent: 4827521 (1989-05-01), Bahl et al.
patent: 4829576 (1989-05-01), Porter
patent: 4829577 (1989-05-01), Kuroda et al.
patent: 4829578 (1989-05-01), Roberts
patent: 4833712 (1989-05-01), Bahl et al.
patent: 4876720 (1989-10-01), Kaneko et al.
paten
Albina Toffee A.
Gould Joel
Graham Noah M.
Scattone Francesco
Sherwood Stefan
Dragon Systems, Inc.
Knepper David D.
LandOfFree
Enrollment in speech recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Enrollment in speech recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Enrollment in speech recognition will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2450263