User barge-in enablement in large vocabulary speech...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000, C704S251000, C704S257000, C704S252000, C704S240000

Reexamination Certificate

active

06246986

ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates to speech processing, and more particularly to “man-machine” interactions where the machine is an arrangement that provides prompts to users and reacts to user responses in the form of natural speech as well as in the form of DTMF or other useful signals. When a machine responds to natural speech, it means that the machine understands and acts upon what people actually say, in contrast to what one would like them to say.
In communications networks there are many instances where a caller who places a call is connected to an interactive voice response unit (VRU) and is asked to interact with that unit. In the prior art, such interaction is generally carried out via a plurality of menu choices that must be selected by designated signals. Typically, the user is requested to depress an appropriate key on the user's telephone set keypad. In some cases, the VRU includes a speech recognizer that allows the user to pronounce such the digits, such as “one,” “two,” etc., or a limited number of specified command words, such as “operator,” or a command phrase such as “collect call.” In many cases such menu-based interactions involve multi-tiered menus. Alas, multi-tiered menu structures are generally unpopular with users, and have proven to be remarkably inefficient at achieving the desired objective. Some studies have shown that more than 60 percent of the attempts to accomplish a particular task through access via such a multi-tiered menu structure are either terminated without the user having reached the desired objective, or are defaulted to an operator.
In order to address these limitations in the prior art, a means for understanding and acting upon spoken input was disclosed by U.S. Pat. No. 5,794,193, issued on Aug. 11, 1998 to one of the inventors herein (henceforth, the '193 patent). Because the disclosure contained in the '193 patent is relevant to the understanding of the field to which this invention belongs, the '193 patent is hereby incorporated by reference. For convenience,
FIGS. 5 and 6
of the '193 patent are duplicated herein as
FIGS. 1 and 2
, respectively.
Element
1
of
FIG. 1
is charged with developing a collection of meaningful phrases. The meaningful phrases are determined by a grammatical inference algorithm that operates on a predetermined corpus of speech utterances (previously acquired) that are applied to meaningful phrase processor
10
. It obtains this collection from a corpus of data that is applied to meaningful phrase processor
10
. Each meaningful phrase developed by the grammatical inference algorithm can be characterized as having both a Mutual Information value and a Salience value relative to an associated task objective. Accordingly, processor
10
associates a desired task with each developed meaningful phrase and provides a confidence level that processor
10
has about the binding of the meaningful phrase to the identified task. An illustrative collection of meaningful phrases is shown in
FIG. 2
where, for example, the phrase “LONG DISTANCE” is associated with an action that is labeled “CREDIT” and has a confidence level of 0.55, and the phrase “MADE A LONG DISTANCE” is also associated with the “CREDIT” action and has a confidence level of 0.93.
Element
2
of
FIG. 1
presents the arrangement for using the information developed by element
1
. An input speech signal (analog) is applied to input speech recognizer
15
, as well as the collection of meaningful phrases and, with the help of conventional word spotting algorithms, recognizer
15
develops an output when one of the meaningful phrases of the provided collection is found in the input speech signal. The recognized output is applied to classification processor
20
which, based on the confidence level of the applied collection, decides whether to identify the input speech signal with a particular task. This, of course, is based on a threshold that is set in classification processor
20
. For example, based on the totality of the data presented in
FIG. 2
, it would be advisable to set a threshold for assigning an input speech signal to the “CREDIT” task below 0.55. This conclusion is reached because the phrase “LONG DISTANCE” is associated with the “CREDIT” task and there are no meaningful phrases that contain the words “LONG DISTANCE” that have been assigned a different task. The caption “MUT INF” in the first column of
FIG. 2
stands for mutual information, which measures the likelihood of co-occurrence of the specified two or more words.
Although the '193 disclosure represents a major advance in the art, additional improvements in man-machine interaction can be realized by overcoming a number of remaining problems. For example, existing systems, while outputting a prompt, are unable to listen to an unconstrained user input and make a determination that the user is trying to communicate something meaningful so that the system could stop speaking and begin taking the user-specified action. The ability to do that can be thought of as the ability to “barge-in.”
Also, existing systems, even when finished prompting and in a listening state, do not recognize well when a user is finished speaking, so as to neither wait in silence for too long nor cut the user off too soon.
SUMMARY
The above-mentioned prior art deficiencies are overcome, and other improvements are achieved with an VRU arrangement that listens while prompting and is able to accept a natural, unconstrained, speech input as well as DTMF input or other signals that represent purposeful communication from a user. In the course of listening while prompting, the arrangement processes the received signal and ascertains whether it is receiving a signal, such as an utterance that is intended to interrupt the prompt, or merely noise or an utterance that is not meant to be used by the arrangement. Additionally, the disclosed arrangement is sensitive to the speed and context of the speech provided by the user and is thus able to distinguish between a situation where a speaker is merely pausing to think, and a situation where a speaker is done speaking.
These improvements are realized in an arrangement that includes a prompter, a recognizer of speech signals, a meaningful phrase detector and classifier, and a turn-taking module, all under control of a dialog manager.
The recognizer of speech listens to all incoming signals and determines whether the incoming signal corresponds to a useful signal, such as speech or DTMF signals, or to a not-useful signal. In elementary embodiments, the not-useful signal may correspond to merely a broadband signal (background noise) or even a sudden increase in the broadband signal's volume (e.g., noise from passing vehicular traffic). In more sophisticated embodiments, anything other than the speaker's voice may be classified as noise.
Signals that pass through the recognizer are applied to the meaningful phrase detector which spots words and, eventually, ascertains whether the speech contains a meaningful phrase. While it spots the words and obtains what might be considered partial utterances, it interacts with the turn-taking module to analyze the collected words and the rate at which the collected words are accumulated. Based on this analysis, the arrangement determines whether to expect additional input from the user, or whether to conclude that no additional input is to be expected and action should be taken based on the collected input speech. The action may be to immediately turn off the playing prompt and to proceed with the task requested by the user, to hold on to an inconclusive determination of the user's request and continue playing the prompt, to discard the inconclusive determination, or any other action that is deemed appropriate. Illustratively, another action that may be deemed appropriate to some artisan who implements a system in accordance with the principles disclosed herein, is to turn off the playing prompt when a meaningful partial utterance is found but a determination of user&apos

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

User barge-in enablement in large vocabulary speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with User barge-in enablement in large vocabulary speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and User barge-in enablement in large vocabulary speech... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2514623

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.