Conversation management in speech recognition interfaces

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000

Reexamination Certificate

active

06246990

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to the field of speech recognition interfaces of computer apparatus and the like, and in particular, to conversation management in such speech recognition interfaces.
2. Description of Related Art
One goal of a computerized interview (clinical assessments, structured interviews, and other individualized indicators) is to maintain the quality of the replaced human-to-human contact. During the interview, the interviewer plays different roles, e.g. test administrator, tester and observer, and the client must understand when the roles change. In human face-to-face interview, the verbal, situational, and paralinguistic cues generally suffice for a smooth transition among the different roles for the interviewer and client. While the rules for conversation are known (although they are difficult to express) to the conversants in a face-to-face dialogue, they are not for face-to-interface dialogues. The “rules” or “etiquette” for a computerized interview have not been established. There are two problems in particular which usually occur in a computerized conversation, namely: when to talk, referred to as the turn taking problem; and, how to talk, referred to as the vocabulary problem.
Persons do not know when to talk in a computerized conversation. A computerized conversation is not like a face-to-face conversation in which the conversants use paralinguistic cues, for example pitch changes and tone, and nonverbal cues, for example, facial expressions, to indicate when it is appropriate for the other person to talk. Moreover, many computer systems do not understand interruptions. In a face-to-face interview, the client can interrupt the interviewer at any time to ask for clarification or to maintain the conversation. This will be a problem until natural language programs can be used effectively in a conversation.
Persons do not know how to speak in a computerized conversation. Speaking to a voice recognition system is not like a face-to-face conversation in which the language has few constraints. On the other hand, generally, in a face-to-interface interview, the speaker will have to be trained how to speak. Sometimes the speaker must speak discretely, but, even with continuous speech, the vocabulary is limited.
Systems that administer tests are not new, however, the additional component of a conversational interview is new. Some kiosks have interactive sessions but they do not generally use voice recognition and don't attempt to initiate a conversation. When a video environment is used in a kiosk interaction, the end user makes choices from a touch screen or other type of selection button. Additionally, kiosk interaction is typically kept as short as possible. Part of the reason for that brevity may be that people tire relatively easily of that style of interaction.
The IBM® Human Center enables conversational computing. An actor's output and recognition can be programmed through the Personality Services and Actor Services components. Even so, the IBM® Human Center does not address what should be in the dialogue or how to manage the conversation.
Finally, there is a large body of research into non-verbal communication and discourse analysis which is pertinent to this field. Reference may be made to: Druckman, D., Rozelle, R. M., & Baxter, J. C., (1982).
Nonverbal Communication: Survey, Theory and Research,
Sage Library of Social Research (139), Beverly Hills: Sage Publications, Inc.; and, [2] Reichman, R. (1985).
Getting Computers to Talk Like You and Me,
Cambridge, Mass.: The MIT Press.
SUMMARY OF THE INVENTION
In accordance with an inventive arrangement, the solution to these problems is a computer programmed with a routine set of instructions stored in a fixed medium which for the first time allocates functions in the user interface to support the goals of the computerized interview. Such a user interface is described herein in the context of conversation management, specifically applied to an interview/assessment dialogue. The inventive arrangement allocates video and speech for different purposes to cue the client or end user when to speak, which alleviates the turn taking problem. The inventive arrangement also allocates video and speech to cue the client or end user how to speak, alleviating the vocabulary problem. Basically, the inventive arrangement employs different technologies to establish conditions that clearly inform the client or end user when and how to speak during a fairly complex situation, the interview.
The context of the interview affects the outcome as much as the content of the assessment tool. The complexity of the context was captured by Reichman, above, who noted that for conversants to follow a conversation, they must share not only common situational knowledge and common semantic reference, they must also share considerable knowledge about the structure of the conversation itself. Video (for example, .AVI files) and recorded speech (for example, .WAV files) are allocated for setting the context of the conversation, or in other words, setting the situational knowledge and references.
The inventive arrangement employs both recorded speech, delivered by a video actor, and synthesized speech, delivered by a synthesized actor, to structure the conversation. In this regard, it is expected that the video actor will use more natural, colloquial speech, and accordingly, speech recognition would not be appropriate for the client's or end user's responses because such responses can also be expected to use more natural, colloquial speech. If the video actor elicits a response from the client or end user, the response would more appropriately be recorded, but not necessarily interpreted by a voice recognition program. Preferably, the video actor would pass control of the interface, and the conversation, to the synthesized actor.
The synthesized actor would ask an appropriate question in more carefully controlled, non colloquial speech. The client or end user can then be expected to respond with a more carefully selected and limited vocabulary, for which speech recognition would be most appropriate.
Finally, the inventive arrangement employs a unique layout of the screen to support both the situational context and the conversation.
In accordance with the inventive arrangements, a computer is programmed with a routine set of instructions for managing conversation in a speech recognition interface, said instructions being stored in a fixed medium. The programmed computer comprises: means for generating a first graphical user interface for a video environment display; means for generating a second graphical user interface for a synthesized environment display; means for generating an audio output interface for audibly transmitting audio information associated with said first and second graphical user interfaces; and, means for generating an audio input interface for receiving audible information as an input for said speech recognition interface.
The video and synthesized environment displays can be arranged for substantially non overlapping presentation, or for at least partly overlapping presentation.
The programmed computer can further comprise: means for originating an information content in at least one of captured video and live video transfer; and, means for originating an information content for said synthesized environment in an acted performance and text-to-speech conversion of speech from said performance.
The programmed computer can further comprise: means for establishing a context for said speech recognition interface with said video environment; and, means for providing examples of how to speak and examples of a proper vocabulary with said synthesized environment.
The programmed computer can further comprise: means for providing predetermined instructions for using said speech recognition interface with said video environment; and, means for answering questions and supplying information in response to said received audible information with said synt

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Conversation management in speech recognition interfaces does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Conversation management in speech recognition interfaces, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Conversation management in speech recognition interfaces will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2542792

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.