Task-independent utterance verification with subword-based...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S249000

Reexamination Certificate

active

06292778

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to automatic speech recognition, and more particularly to automatic speech recognition systems and methods providing utterance verification in which a hypothetical result of a recognition operation on a speech sample is verified to determine whether such speech sample actually contains the output of the recognition step. Still more particularly, the invention relates to systems and methods for providing subword-based utterance verification and for training such systems using a minimum verification error approach.
BACKGROUND OF THE INVENTION
Telecommunications service providers and other organizations which provide telephone-based services to remote customers or users have historically relied on human operators or agents to act as an interface between the customer or user and whatever instrumentality is used by the organization to actually provide the service. For example, telephone service providers have long provided enhanced telephone services of various sorts to customers using human operators. The operator receives from the customer a request for a service (e.g. credit card billing for a telephone call) and operates a suitable interface to the telephone network to cause the requested service to be provided. In some cases, the operator may directly deliver the requested service (e.g., by announcing to the customer a requested directory listing or the like). Banks, airlines, government agencies, and other organizations provide services to customers and users in a similar manner.
It is expensive to deliver services using human operators or agents. Many service transactions do not require complex interaction between the customer and the operator or agent. Accordingly, service providers have developed automated systems for providing many of the services previously executed through human operators or agents, thereby reducing costs and reserving human operators for transactions requiring human assistance such as those involving complex customer interaction. Many automated service systems require the customer to interact by pressing keys on the telephone, which is inconvenient for many customers.
Accordingly, service providers and others have sought automated speech recognition (ASR) systems capable of receiving interaction from customers or users via the spoken voice for use in providing telephone-based services to callers. In order for ASR systems to be broadly applicable, they must be “speaker-independent”—i.e., capable of accurately recognizing speech from a large plurality of callers without being exposed in advance to the speech patterns of each such caller. Many such systems have been developed. One approach to the construction of such a system employs two main components: a recognition component which, given a sample of speech, emits as a hypothesis the most likely corresponding translation from the recognition component's predefined vocabulary of speech units; and a verification component, which determines whether the speech sample actually contains speech corresponding to the recognition component's hypothesis. The utterance verification component is used to reliably identify and reject out-of-vocabulary speech and extraneous sounds.
Several technologies have been developed to implement the recognition component in ASR systems, and several technologies, some similar and others non-similar to those used in recognition, have been used to implement the utterance verification component. The particular recognition technology employed in an ASR system does not necessarily dictate the technology used for utterance verification. It is generally not apparent, a priori, whether a selected recognition technology may be advantageously used with a particular utterance verification technology, or how two candidate technologies may be usefully married to produce a working ASR system. Acceptable results have been obtained in ASR systems having recognition components which use acoustic speech models employing Hidden Markov Models (HMMs) as described in L. R. Rabiner and B. H. Juang, “An Introduction to Hidden Markov Models,”
IEEE ASSP Magazine
, January 1986, pp. 4-16.
Various recognition and utterance verification components have employed models based on relatively large speech units, such as words or phrases. In a given ASR system, the utterance verification component typically employs speech units equivalent in size to that employed by the recognition component because the units output from the recognition component are supplied to the utterance verification component. U.S. Pat. No. 5,717,826, and R. A. Sukkar, A. R. Setiur, M. G. Rahim, and C. H. Lee, “Utterance Verification of Keyword Strings Using Word-Based Minimum Verification Error (WB-MVE) training,” Proc. ICASSP '96, Vol. I, pp. 518-521, May 1996, disclose ASR systems providing utterance verification for keyword strings using word-based minimum verification error training.
Systems which employ large speech units generally require that the recognition component and the utterance verification component be trained for each speech unit in their vocabularies. The need for training for each speech unit has several disadvantages. In order for the ASR system to be speaker independent, speech samples for each large unit (e.g., whole words and/or phrases) must be obtained from a plurality of speakers. Obtaining such data, and performing the training initially, is resource intensive. Moreover, if a speech unit must be later added to the vocabulary, additional samples for that must be obtained from a plurality of speakers.
It is believed that most human languages employ a limited number of basic speech sounds which are concatenated to form words, and that speech in most such languages may be suitably represented by a set of basic speech sounds associated with that language. The basic speech sound units are often referred to as phonemes or “subwords.” In order to avoid the disadvantages of ASR systems based on large speech units, there have been systems developed which are based on subwords. In subword-based systems, the results from the recognition component may be available as a string of recognized subwords, and a concatenated group of recognized subwords between two periods of silence may represent a word, phrase, or sentence. One of the main features of subword-based speech recognition is that, if the acoustic subword models are trained in a task independent fashion, then the ASR system can reliably be applied to many different tasks without the need for retraining. If the ASR system is to be used to recognize speech in a language for which it was not originally trained, it may be necessary to update the language model, but because the number of unique subwords is limited, the amount of training data required is substantially reduced.
It is generally not apparent, a priori, whether a selected recognition or utterance verification technology which works well for a given speech unit size may be advantageously applied to speech units of a different size. Moreover, the best ways of performing utterance verification on individual subwords, of applying the results therefrom in a meaningful way to words, phrases, or sentences formed by concatenating recognized subwords, and of training subword based utterance verification models, are still being explored.
Certain methods for task independent utterance verification have been proposed. For example in H. Bourlard, B. D'hoore, and J. -M. Boite “Optimizing Recognition and Rejection Performance in Wordspotting Systems,”
Proc. ICASSP '
94. pp. 373-376, Vol. 1, April 1994, and in R. C. Rose and E. Lleida, “Speech Recognition Using Automatically Derived Acoustic Baseforms,”
Proc. ICASSP '
97, pp. 1271-1274, April 1997, an “on-line garbage” likelihood is computed and a likelihood ratio is then formed between the “on-line garbage” likelihood and the likelihood of the recognized word, phrase, or sentence. In R. A. Sukkar, C. H. Lee, and B. H. Juang. “A Vocabulary Independent Discriminatively Trained Method for Rejecti

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Task-independent utterance verification with subword-based... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Task-independent utterance verification with subword-based..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Task-independent utterance verification with subword-based... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2451007

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.