Tone and speech recognition in communications systems

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S247000, C704S251000, C704S273000, C379S067100, C379S093030, C379S088060

Reexamination Certificate

active

06236967

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates in general to improved speech and tone recognition systems and more particularly, to an improved method and apparatus for recognizing speech data based on speech and tone recognition techniques over a communications system.
BACKGROUND OF THE INVENTION
Recent advances in speech recognition technology have provided the impetus for the increasing commercialization of speech recognition technology in different market segments of various industries. One such industry that has experienced increased use of speech recognition system, is the telecommunications industry which strives to apply the technology in automated attendant systems for services or applications such as order taking, directory assistance, data entry to name a few. Proponents of speech recognition technology believe that it is well suited for telecommunications applications. In most applications of speech recognition technology in the telecommunications field, a user is prompted to speak into the mouthpiece of a telephone handset. The speech signals provided by the speaker are first converted into digital values through a sampling process, and thereafter the digital values are in turn converted into a sequence of patterns to allow the words uttered by the speaker to be recognized from a list or group of pre-stored words. Predetermined words within the list are typically stored as templates wherein each template is made of sequences of patterns of speech sounds better known as “phonemes”. This type of recognition technique is commonly referred to as “whole word template matching”. Over the last few years, the word-template-matching technique has been advantageously combined with dynamic programming to cope with nonlinear time scale variations between spoken words and pre-stored templates.
In spite of the recent technological advances in speech recognition technology, a series of factors, however operate to impede the commercialization of speech recognition systems. Prominent among such factors is the inability of speech recognition systems to easily distinguish homonyms, such as “to”, “too” and “two”. Equally problematic is the difficulty of recognizing words that may be uttered or even pronounced differently due to the effect of speakers' regional accents. It is also well known that speech recognition systems have some difficulty in separating from each other words that rhyme, or otherwise sound alike, such as “bear” and “pear”, “but” and “pot”.
In response to this problem, three solutions have been proposed. One such solution that is described in U.S. Pat. No. 5,212,730, is to use text-derived recognition model in concert with decision rules to differentiate various pronunciations of a word. Another solution proposes the use of context-related data and decision rules, in addition to stored templates, to facilitate more accurate recognition of spoken words. A third solution opts out of speech recognition all together in favor of receiving information from a user in the form of Dual Tone Multi-Frequency (DTMF) signals entered by a user from a touch-tone keypad of a telephone set. Although DTMF entries accurately represent numeric strings provided by a user, they are ill suited for applications in which the numeric strings include more than fifteen digits. The digits in such long string need to be re-entered, one at a time, if an error occurs at any time during the keying process. Of particular significance is the inability of DTMF entries to accurately represent alphabetic or alphanumeric string of characters since each key on a telephone keypad shares at least three letters.
SUMMARY OF THE INVENTION
We have realized that for certain speech recognition applications, numeric DTMF entries keyed by a caller can be used to improve the accuracy of speech recognition systems by serving as a pointer to limit the number of stored templates that need to be compared to speech data subsequently provided by the caller. In an embodiment of the principles of the invention, a communications system is arranged to prompt a user to provide a first set of information in the form of touch-tone entries on a dial pad. Thereafter, the user is prompted to provide a second set of information in the form of speech signals delivered to the transmitter of the handset of a telephone. The communications system uses the DTMF signals or data generated from the touch tone entries as a search key to retrieve only the stored templates associated with such DTMF data.
In an example of an implementation of the principles of the invention, a speech recognition system includes a database that stores the zip codes of a country. Associated with each zip code are stored templates of addresses within that zip code. For an order entry application, for example, the speech recognition system prompts a caller to enter on a telephone dial pad the zip code of the caller's home (or office) address. The speech recognition system then queries the database to determine whether the zip code entered by the caller matches one of the stored zip codes in the database. If so, the speech recognition system may repeat the matched zip code to confirm with the caller that the matched zip code is indeed accurate. If no match is found for the zip code provided by the caller, the speech recognition system prompts the caller to re-enter a new zip code. The speech recognition system may terminate the process if the caller enters no valid zip codes after a pre-determined number of attempts. Once a zip code is matched and confirmed, the speech recognition system prompts the caller to provide address information in the form of speech data. Thereafter, the speech recognition system uses the matched zip code to retrieve stored templates of addresses associated with such zip code. The speech recognition system then looks for a match between the speech data provided by the caller and one of the retrieved stored templates. If a match is found, then the caller is prompted to verify that the speech recognition system has accurately recognized the speech data provided by the caller. If a match is not found, the speech recognition system selects the address(es) closest to the received speech signals and presents such address(es) to the caller in “most close to less close” order.


REFERENCES:
patent: 4979206 (1990-12-01), Padden et al.
patent: 5315688 (1994-05-01), Theis
patent: 5553119 (1996-09-01), McAllister et al.
patent: 5638425 (1997-06-01), Meador, III et al.
patent: 5661787 (1997-08-01), Pocock
patent: 5732395 (1998-03-01), Silverman
patent: 5761640 (1998-06-01), Kalyanswamy et al.
patent: 5903864 (1999-05-01), Gadbois et al.
patent: 5907597 (1999-05-01), Mark

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Tone and speech recognition in communications systems does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Tone and speech recognition in communications systems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Tone and speech recognition in communications systems will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2524584

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.