Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1999-12-14
2004-06-08
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
Reexamination Certificate
active
06748361
ABSTRACT:
FIELD OF THE INVENTION
The present invention generally relates to computing apparatus and, more particularly, to methods and apparatus for providing a spoken language interface in association with such computing apparatus.
BACKGROUND OF THE INVENTION
Speech technology has progressed to the point, in recent years, that command and control functions and transcription functions may be performed reliably using speech decoders such as, for example, the IBM Via Voice product line, a trademark of IBM Corporation of Armonk, N.Y. Technology for the encoding of text into audible speech is also widely available. Thus, it is reasonable to expect that products using these and other spoken language technologies will have been developed and brought to market. These products fall into two critical areas: enablers and tools. The personal speech assistant of the present invention is an enabler in the sense that it works in conjunction with a tool to enable access to the tool's capabilities through a spoken language interface.
Typical tools employing voice are best exemplified by portable voice recorders. These include devices such as the “Voice It” mobile digital recorder, and the Dragon Systems, Inc. “Naturally Speaking Mobile Organizer.” The first is merely a digital recorder which can be used to take notes which can be transcribed by a speech recognition program. The transcribed notes are not returned to the device. The device, as a simple recorder, does not accept voice commands. The user is required to push buttons to control the recording functions. In the second case, the Dragon Systems Mobile Organizer does allow the user to speak commands, but these commands are acted upon as part of the transcription process, at whatever future time the user chooses to download the recordings. There is no general capability to offer voice control to any device other than the transcription software in a personal computer. The hardware and software capabilities, for example, for text to speech encoding are not provided because the only data type such a device need manage is digitized audio, not encoded text. Thus, so called “mobile digital recorders” and voice input “mobile organizers” do not have the immediate connection or the ability to speak to the user, which are needed to assist a user by supporting conversational control and information supplying dialogs.
Another example of speech enabled tools may be found in “palm top” computers. Devices such as the Casio E100 using the WinCE (a trademark of Microsoft Corporation of Redmond, Wash.) operating system allow individual applications to operate through spoken language related services. Access to these services is provided an Application Programmer's Interface such as SMAPI, an “enabler” of speech interfaces. Here, each individual application on the computer must contain all of it's dialog management data and software. The role of SMAPI is only to provide a common interface to the services of spoken language engines, not to provide the more abstract means to provide dialog or the hardware architecture to support dialog. Each application is thus a unique speech “device.” Dedicating a palm top computer to the task of an interface tool would not be cost effective.
An other example of an enabler is the Philips “Speech Mike.” This device is a dedicated interface device which provides a microphone and a speaker, but can only operate in the context of a personal computer since it carries only enough on board intelligence to service the coding and communications needs of the built-in track ball.
SUMMARY OF THE INVENTION
In accordance with the present invention, a Personal Speech Assistant (PSA) is a computing apparatus which provides a spoken language interface to another apparatus to which it is attached. It is to be understood that the attachment may be made through physical means such as wires, radio waves or light, or by mixtures of logical and physical means such as computer networks or telephone networks. In order to provide a spoken language interface, a Personal Speech Assistant is designed to support execution of a conversational dialog manager and its supporting service engines. An example of a such a dialog manager is described in detail in the concurrently filed U.S. patent application Ser. No. 09/460,961, in the name of L. Comerford et al., and entitled: “A Scalable Low Resource Dialog Manager,” the disclosure of which is incorporated herein by reference. A preferred implementation is described in the detailed description below.
In operation, a PSA is connected to a device which provides some service to a user. Any “appliance” is a candidate for enhancement with the PSA. Devices such as, for example, video cassette recorders (VCRs) or Personal Digital Assistants (PDAs), which offer rich, but frequently difficult interfaces, may be made more useful by the integration of a PSA according to the invention. A PSA need not be permanently attached to the device for which it provides an interface. In a car, for example, a PSA may take on some of the responsibilities of the car key, in the sense that the PSA may be taken away by the owner when the car is parked. In this case, the owner may command the door to open, and the PSA, through a wireless connection and protocol, may translate that instruction into one accepted be the car. Once in the car, the owner may place the PSA in a “cradle” which offers a wired connection to the car electronics so that the radio, navigation, or environmental systems may be instructed concerning the owners wishes.
It is a preferred feature of a dialog manager used by the PSA that the user interface properties, in terms of the vocabulary the device understands, the informative prompts it provides, and other aspects of its conversational behavior, are all easily modified to correspond to the preferences or limitations of the user. If one word does not get recognized, a synonym can be made to replace it. If a prompt is not to the users liking, it is easily changed.
In an illustrative embodiment of the present invention, apparatus for providing a portable spoken language interface for a user to a device in communication with the apparatus, wherein the device has at least one application associated therewith, comprises: an audio input system for receiving speech data provided by the user; an audio output system for outputting speech data to the user; a speech decoding engine for generating a decoded output in response to spoken utterances; a speech synthesizing engine for generating a synthesized speech output in response to text data; a dialog manager operatively coupled to the device, the audio input system, the audio output system, the speech decoding engine and the speech synthesizing engine; and at least one user interface data set operatively coupled to the dialog manager, the user interface data set representing spoken language interface elements and data recognizable by the application of the device; wherein: (i) the dialog manager enables connection between the input audio system and the speech decoding engine such that the spoken utterance provided by the user is provided from the input audio system to the speech decoding engine; (ii) the speech decoding engine decodes the spoken utterance to generate a decoded output which is returned to the dialog manager; (iii) the dialog manager uses the decoded output to search the user interface data set for a corresponding spoken language interface element and data which is returned to the dialog manager when found; (iv) the dialog manager provides the spoken language interface element associated data to the application of the device for processing in accordance therewith; (v) the application of the device, on processing that element, provides a reference to an interface element to be spoken; (vi) the dialog manager enables connection between the audio output system and the speech synthesizing engine such that the speech synthesizing engine which, accepting data from that element, generates a synthesized output that expresses that element; and (vii) the audio outp
Comerford Liam David
Frank David Carl
Nahamoo David
Dang Thu Ann
Dorvil Richemond
International Business Machines - Corporation
Ryan & Mason & Lewis, LLP
Storm Donald L.
LandOfFree
Personal speech assistant supporting a dialog manager does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Personal speech assistant supporting a dialog manager, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Personal speech assistant supporting a dialog manager will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3353122