Context sharing of similarities in context dependent word...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000, C704S251000, C704S255000

Reexamination Certificate

active

06285980

ABSTRACT:

TECHNICAL FIELD
The invention relates to automatic speech recognition and more particularly to a method and apparatus for automatic speech recognition of numbers, times and dates that are spoken in natural language words.
DESCRIPTION OF THE PRIOR ART
In many telephony applications or services that use automatic speech recognition, the ability to recognize numbers is often a major component. Traditionally, recognition of speech input containing numbers has been achieved with high accuracy using isolated or connected digit recognizers. As speech recognition gains visibility and finds a wider range of applications, it is often not feasible to expect users to provide input solely in the form of isolated or connected digits. For example, the number “847” is likely to be spoken as the sequence of digits “eight four seven” in a context such as a U.S. telephone number, but in a different context such as a monetary amount, the same number is more likely to be spoken in a natural way as “eight hundred and forty seven.” This latter case is what is referred to as a natural number.
Previously, natural speech has been avoided or curtailed because of processing and system requirements for natural speech recognition systems and processes. In the general category of speech recognition systems, if there are N words in a vocabulary of a task or application, the total number of contexts C that would be needed to be modeled (including the silence model) is C={2.N.(N+1)}+N+1. Thus the number of contexts grows primarily as twice the square of the number of words in the vocabulary. According to the previous context equation, a one hundred natural word vocabulary would have over twenty thousand contexts for the model to deal with.
Previous work on natural number recognition can be found in the literature. In “Speech recognition using syllable-like units,” by Hu et al. published in Proceedings International Conference on Spoken Language Processing, pp. 1117-1120, 1996, the authors proposed the use of syllable-like units as the basic units of recognition, and tested the approach on a database consisting of the months of the year. Results on spotting a “time of day” event in telephone conversations were reported in an article entitled “Spotting events in continuous speech” published in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 381-384, 1994 authored by Jeanrenaud et al. The concept of head-body-tail models was used for connected digit recognition in the article entitled. “Minimum error rate training of inter-word context dependent acoustic model units in speech recognition,” published in Proceedings International Conference on Spoken Language Processing pp. 439-442, 1994 by Chou et al. In the article “Recognition of spontaneously spoken connected numbers in Spanish over the telephone line,” published in Proceedings EUROSPEECH-95, pp. 2123-2126, 1995 by Torre et al. the authors found an improvement in Castilian Spanish connected number recognition by using techniques such as tied-state modeling, multiple candidates, spectral normalization, gender modeling, and noise spotting. Similar results on 15 recognition of Danish telephone numbers were reported in an article entitled “Automatic recognition of Danish natural numbers for telephone applications,” published in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 459-462, 1996 by Jacobsen and Wilpon.
For automatic speech recognition systems beyond the telephone digit recognition task, three examples of potentially beneficial applications of natural number recognition are recognition of time of day, date, and monetary amounts. Such types of natural number recognition can be used for developing products and services such as schedulers, travel planners, and for banking applications such as bill payments, fund transfers, etc. The vocabularies for these potential applications might be held to a manageable number, but the number of contexts required for context-dependent speech recognition would still be extremely high. Additionally, to make such services usable, this type of recognizer must have a very high string accuracy for input containing spoken natural words.
Thus, there is a need in the art for a method and system for recognizing words spoken in natural language for commercial, monetary and scheduling applications. The method and system must not take excessive amounts of processing time or system assets, yet the recognition accuracy must be high.
SUMMARY OF THE INVENTION
Briefly stated, according to one aspect of the invention, the aforementioned problems are solved and the shortcomings of the art overcome by providing a method for automatic speech recognition. The method includes the steps of: receiving a spoken utterance containing at least one word of a vocabulary of digit words and non-digit words; processing the utterance into cepstral coefficients; separating the utterance into at least one word; separating each word into a head portion, a body portion and a tail portion; recognizing at least one word from the vocabulary using said head portion, said body portion and said tail portion.
In another aspect of the invention, the aforementioned problems are solved and an advance in the art is achieved by providing a method for automatic speech recognition that includes the steps of: receiving an utterance containing at least one word of a vocabulary of digit words and non-digit words; processing the utterance into cepstral coefficients; separating the utterance into at least one word; separating at least one word into a head portion, a body portion and a tail portion; recognizing each word from the vocabulary using said head portion, said body portion and said tail portion. The vocabulary over which the head-body-tail recognition model operates includes a group of time of day words.
In another aspect of the invention, the aforementioned problems are solved and an advance in the art is achieved by providing a method for automatic speech recognition that includes the steps of: receiving an utterance containing at least one word of a vocabulary of digit words and non-digit words; processing the utterance into cepstral coefficients; separating the utterance into a plurality of words; separating at least one of said plurality of words into a head portion, a body portion and a tail portion; recognizing at least one each word from the vocabulary using said head portion, said body portion and said tail portion. The vocabulary of the recognition model includes digit words ‘zero’, ‘oh’, ‘one’, ‘two’, ‘three’, ‘four’, ‘five’, ‘six’, ‘seven’, ‘eight’, and ‘nine’.
In another aspect of the invention, the aforementioned problems are solved and an advance in the art is achieved by providing a method for automatic speech recognition that includes the steps of: receiving an utterance containing at least one digit word and at least one non-digit word; processing the utterance into cepstral coefficients; separating the utterance into a plurality of words; separating at least one of said plurality of words into a head portion, a body portion and a tail portion; and recognizing at least one of said plurality of words using a vocabulary for numbers, date and time of day.
In another aspect of the invention, the aforementioned problems are solved and an advance in the art is achieved by providing a method for automatic speech recognition that includes the steps of: receiving an utterance containing at least one digit word and at least one non-digit word; processing the utterance into cepstral coefficients; separating the utterance into a plurality of words; separating at least one of said plurality of words into a head portion, a body portion and a tail portion; and recognizing at least one of said plurality of words using a vocabulary for numbers, date and time of day. The head-body-tail recognition model of this method also has a second plurality of words that have a plurality of shared contexts.


REFERENCES:
patent: 4989248 (1991-01-01), Schalk et al.
pate

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Context sharing of similarities in context dependent word... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Context sharing of similarities in context dependent word..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Context sharing of similarities in context dependent word... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2506271

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.