Interactive voice response system

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S207000, C704S270000

Reexamination Certificate

active

06704708

ABSTRACT:

FIELD OF INVENTION
This invention relates to an interactive voice response system and in particular relates to speech recognition processing within an interactive voice response (IVR) system.
BACKGROUND OF INVENTION
In today's business environment, the telephone is used for many purposes: placing catalogue orders; checking airline schedules; querying prices; reviewing account balances; notifying customers of price or schedule changes; and recording and retrieving messages. Often, each telephone call involves a service representative talking to a caller, asking questions, entering responses into a computer, and reading information to the caller from a terminal screen. This process can be automated by substituting an interactive voice response system with speech recognition for the operator.
A business may rely on providing up-to-date inventory information to retailers across the country and an interactive voice response system can be designed to receive orders from customers and retrieve the data they request from a local or host-based database via a business application. The IVR updates the database to reflect any inventory activity resulting from calls. The IVR enables communications between a main office business application and a marketing force. A sales representative can obtain product release schedules or order product literature anytime, anywhere, simply by using the telephone. A customer can inquire about a stock item, and the IVR can determine availability, reserve the stock, and schedule delivery.
A banking application using an IVR with speech recognition includes the following sequence of steps. A prompt is played to the caller and the caller voices a response. The voice signal is acquired and speech recognition is performed on the voice signal to create text. Only once the speech recognition is finished and the text is formed is the text response analyzed and processed for a result which may be played to the caller. For instance the user may ask how much money in his savings account and the speech recognition engine processes the whole signal so that a Natural Language Understanding (NLU) module can extract the relevant meaning of ‘savings account balance’. This result is passed to a banking application to search and provide the answer.
One problem with the above voice activated database query type application is that two time intensive tasks are performed one after the other. It is known for a voice input to be processed to text and this input to be used as the basis for a query to be processed on a database. Each process can take up time of the order of seconds and the total time of the combined processes can be noticeable to the user. It would be desirable to reduce the total time for a voice recognition and database query. Moreover this has a related cost summary.
It is known to predict certain speech elements in advance of a full analysis. U.S. Pat. No. 5,745,873 discloses a method for recognising speech elements (e.g. phonemes) in utterances including the following steps. Based on acoustic frequency, at least two different acoustic representatives are isolated for each of the utterances. From each acoustic representative, tentative information on the speech element in the corresponding utterance is derived. A final decision on the speech element in the utterance is then generated, based on the tentative decision information from more than one of the acoustic representatives. Advantage is taken of redundant cues present in elements of speech at different acoustic frequencies to increase the likelihood of correct recognition. Speech elements are identified by making tentative decisions using frequency-based representations of an utterance, and then by combining the tentative decisions to reach a final decision. This publication discloses that a given sub-band section (in this case a frequency band) of speech contains information which may be used to predict the next sub-band section. One aspect is to get a more accurate recognition result by separately processing frequency bands.
However the above solution is still a sequential process in a voice application and the total time taken is still the combined time of the speech recognition and the later processing.
SUMMARY OF INVENTION
In one aspect of the invention there is provided a method for processing in an interactive voice processing system comprising: receiving a voice signal from user interaction; extracting a plurality of measurements from the voice signal; calculating an average of said measurements; locating a reference characteristic matching said average; and using text associated with the closest reference characteristic as an estimate of the text of the voice signal.
Thus the invention requires only acoustic analysis of a voice signal to determine a response. Since it does not require phonetic analysis of the signal to convert into text and then a natural language analysis to extract the useful meaning from the text considerable processing time is saved.
In a preferred embodiment the acoustic feature is a non-phonetic feature extracted from a frequency analysis of the voice signal. More than one non-phonemic feature of the voice signal may be acquired and used to determine the response. For instance, it is known that the acoustic effects of nasalisation, including increased bandwidths, will spread forward and backward through many segments.
Certain predictive qualities can be found in speech signals. In purely linguistic terms, articulatory settings shift during speech towards those of the prosodically marked element within a given breath group. There exist significant differences between average formant frequencies and related acoustic parameters in the “same” carrier phrase according to the segmental content of a prosodically marked (stressed) item within the breath-group. When presented with very short, and increasing portions of speech, subjects are able to predict what a complete utterance or word may be. These predictive capabilities, based either on significant differences within the signal itself, or on top-down (i.e. prior) knowledge, or both, to enhance the performance of advanced ASR and NLU/Dialogue Management-enabled services: during recognition, predications can be made as to what the most likely and significant information is in the phrase. Such predictions could be used to activate natural language understanding (NLU) and the task specific dialogue management modules which would therefore be able to return a possible result (or N-Best possibilities) to the application or service before the speaker has finished speaking. This would lead to increased response-times, but could also be used to cater for poor transmission rates in offering the most likely responses even before the complete signal has been received effectively subsetting the total number of possible responses. Further, a running check between predictions and the result of actual recognition would help provide a dynamic indicator of how effective (i.e. how accurate) such predictions were for a specific instance: allowing greater or lesser reliance on such predicted responses on a case-by-case basis.
In the preferred embodiment only that portion of the speech signal received to date is analyzed. Significant differences in that signal as compared with other signals with the same linguistic content (the “same” sounds and words) are used to predict the later, as yet unanalyzed, portion of the signal.
The acoustic processing is performed to acquire characteristics of the voice signal which have the relevant predictive characteristics and are more accessible and quicker to calculate. For instance, to perform full speech analysis on a phrase the whole phrase must be acquired so that a rough speech analysis of a ten second phrase using a 400 MHz processor can take additional time on top of the 10 seconds of speech. Whereas the initial acoustic characteristics of a voice signal may be obtained in the first second or seconds before the speech is even completed.
Due to the relatively low variation of the queries input to the voi

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Interactive voice response system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Interactive voice response system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Interactive voice response system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3208626

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.