Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1998-04-07
2001-06-05
Zele, Krista (Department: 2748)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
Reexamination Certificate
active
06243678
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates generally to a method and apparatus for speech recognition and, more particularly, to a method and system for dynamic speech recognition using free-phone scoring.
2. Description of the Related Art
Remote telephone access by a customer to confidential bank or credit card account information has become common. Typically, the customer enters an account number followed by a Personal Identification Number (PIN) via the telephone keypad. The application automatically accesses the specified account information and compares the entered PIN with that stored in the account. If there is a match between PINs, then the application allows the customer to proceed to access the account information. On the other hand, if the PINs do not match, then the application usually calls for human intervention, such as forwarding the call to an account representative. Often the PINs do not match because the customer has forgotten the PIN, and, therefore, the account representative must request a “secret password,” such as the maiden name of the customer's mother, to identify the customer as having authorization to access the account. Such human intervention is costly, requiring a team of representatives waiting to intervene.
Typical speech recognition systems are unsuitable for replacing such human intervention. Speech recognition systems usually include a database storing voice templates or models, which represent complete words or phrases. The system compares these templates or models, which are constructed from collected data samples, to the received spoken words. Consequently, the database must comprise all possible responses and, therefore, requires the collection and verification of a large number of data samples. Where the recognition system is employed in an application in which customer responses are limited, such a system may be acceptable. Where the recognition system is employed in an application in which customer responses are virtually unlimited, such a system is unacceptable. Thus, a need exists for an improved voice recognition system that does not require the collection and verification of a large number of data samples.
Improvements have been made in the field of speech recognition systems. For example, U.S. Pat. No. 5,329,608 to Bocchieri et al. (Bocchieri) is directed to an improved speech recognition system. In general, Bocchieri addresses the problem of requiring data collection and verification to create word templates by allowing a customer to enter anticipated responses into a computer. Once the customer enters anticipated responses via a keyboard, the computer creates a phonetic transcription of each entered word. Creating the phonetic transcription involves accessing a dictionary database, which contains common words and associated phonetic transcriptions, and determining whether the entered word and its associated phonetic transcription already exist. If the phonetic transcription does not exist, the computer proceeds to store the entered word with its associated phonetic transcription in a vocabulary lexicon database.
Upon receiving a spoken word input, the computer constructs a subword model of the word comprising one or more sequences of subwords. Each subword comprises a series of phonemes. Each phoneme, in turn, represents a discrete sound.
The computer compares the subword model to the phonetic transcriptions in the vocabulary lexicon database to determine whether the spoken input “matches” the entered anticipated response corresponding to the phonetic transcription. The system deems that a match has occurred by assigning a confidence recognition factor to the comparison of the subword model and the phonetic transcription and determining whether that confidence factor exceeds a predetermined confidence threshold value. However, the system cannot recognize spoken data if that data has not been previously entered. Thus, while Bocchieri allows easy customization of the system, the customer of the system must still have prior knowledge of all potentially received spoken data so that it can be entered via the keyboard into the system.
Additionally, speech recognition systems must be reliable. Traditionally, the accuracy of speech recognition systems has been ensured by setting a high confidence threshold when comparing the subword model with the phonetic transcription. Such a high confidence threshold ensures that no erroneous access is allowed; however, often the high threshold causes the system to erroneously find no match and deny access to an authorized customer. Individualistic speech patterns and pronunciations and coarticulation error, which often results in the blending of phonemes, are some of the factors that contribute to these erroneous denials of access. These same factors contribute to erroneous allowance of access.
Some speech recognition systems ensure reliability by taking advantage of these individualistic speech patterns and pronunciations. More specifically, these systems utilize voice transcriptions to create a trained subword model database. This trained subword model database comprises customer-dependent phonemes. While improving reliability, the systems are expensive in set-up and operation.
Specifically, providing trained subword model databases requires pre-enrollment. Typically, training requires the customer to recite a few sentences containing words that comprise most, if not all, phonemes. The sentences are broken up into the phonemes for use in the customer-dependent subword database. Because the system involves training most or all phonemes, the system has the advantage that the secret password can be changed without retraining the system. The system, however, has the disadvantage of being costly to set up and operate. The added expense lies not only in the creation of each trained subword database, but also in the creation of the entire speech recognition system because a separate customer-dependent model must be created for each customer.
Another type of speech recognition system that utilizes a trained subword model involves training only the secret password and the particular phonemes contained therein. This type of system is less costly to implement because a trained subword database containing all phonemes is not necessary. Instead, pre-enrollment involves training only the secret password. Because only the particular secret password is trained, however, any change to the password requires re-enrollment and re-training of the new password. Again, the system is costly to set up and operate. Thus, although systems using trained subword databases are reliable, they are somewhat impractical; the need for an improved voice recognition system, particularly one that does not require prior enrollment, remains unsatisfied.
3. Summary of the Invention
These needs are satisfied by a method for recognizing a speech utterance as a predetermined unit of speech. The method comprises generating a free-phone model of the speech utterance and calculating a free-phone score representing the likelihood that the free-phone model accurately represents the speech utterance. The method also comprises determining whether the speech utterance matches the predetermined unit of speech based upon its score. In an alternative embodiment, the determination of whether the speech utterance matches the predetermined unit of speech is based upon both a word score and the free-phone score. A system for recognizing a speech utterance as a predetermined unit of speech is also provided.
REFERENCES:
patent: 4227177 (1980-10-01), Moshier
patent: 4241329 (1980-12-01), Bahler et al.
patent: 4481593 (1984-11-01), Bahler
patent: 4489434 (1984-12-01), Moshier
patent: 4489435 (1984-12-01), Moshier
patent: 4837831 (1989-06-01), Gillick et al.
patent: 5202952 (1993-04-01), Gillick et al.
patent: 5526463 (1996-06-01), Gillick et al.
patent: 5719997 (1998-02-01), Brown et al.
patent: 5822730 (1998-10-01), Roth et al.
patent: 5850627 (1998-12-01), Gould et al.
patent: 5909666 (1999-06-01), Gould et al.
pat
Erhart George W.
Hartung Ronald L.
Lucent Technologies - Inc.
Opsasnick Michael N.
Zele Krista
LandOfFree
Method and system for dynamic speech recognition using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for dynamic speech recognition using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for dynamic speech recognition using... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2531455