Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
2000-08-28
2003-10-07
McFadden, Susan (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S275000, C704S260000, C704S270100
Reexamination Certificate
active
06631350
ABSTRACT:
CROSS REFERENCE TO RELATED APPLICATIONS
(Not Applicable)
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
(Not Applicable)
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to the field of speech enabled computing and more particularly to a device-independent system, method and apparatus for linking a speech driven application to specific audio input and output devices.
2. Description of the Related Art
Speech driven applications differ from traditional GUI based applications in that speech driven applications handle audio data for both input and output. Typically, GUI based applications rely on an input device, such as a mouse or keyboard, for input and on a visual display, such as a monitor, for output. In contrast, speech driven applications rely on an audio input device, such as a microphone, for input and on an audio output device, such as speakers, for output. Typically, audio input data received from the audio input device can be provided via audio circuitry to a speech recognition engine for conversion to computer recognizable text. Similarly, computer recognizable text originating in the speech driven application can be provided to a text-to-speech engine for conversion to audio output data to be provided via audio circuitry to the audio output device.
Presently, speech driven applications require audio input data received from an audio input device to be in a media format suitable for use with a corresponding speech recognition engine. Likewise, speech driven applications require audio output data generated by a text-to-speech engine and provided to an audio output device to be in a media format specific to the audio output device. Yet, audio input and output devices can vary from transducer-type devices such as microphones and speakers to specialized audio circuitry and systems to distributed audio input and output devices remotely positioned across a network. Hence, speech driven application developers have been compelled to handle the receipt and transmission of audio data from and to varying audio input and output sources and corresponding media transport protocols on a case-by-case basis. As a result, substantial complexity necessarily is added to the speech driven application.
There have been several attempts to transport audio data to and from speech driven applications in a manner which frees the speech application developer from varying audio data transmission and receipt methods according to specific audio data input and output sources. Some examples include the multimedia API layer of the Microsoft Windows® operating system and the multimedia presentation manager of the IBM OS/2® operating system. However, both examples require highly complex interactions on behalf of the speech application developer and neither permits a simple audio data stream-in/stream-out approach to the transmission and receipt of audio data from varying data sources. In addition, both examples are compiled solutions which are platform specific to a particular hardware configuration and a specific operating system.
The Java™ Media Framework (JMF™) represents one attempt to transport audio data to and from a speech driven application in a hardware and operating system neutral device. JMF is fully documented in the 
Java Media Framework API Guide 
(JMF API Guide) published by Sun Microsystems, Inc. of Mountain View, Calif. on Nov. 19, 1999 (incorporated herein by reference) and the 
Java Media Framework Specification 
(JMF Specification) also published by Sun Microsystems, Inc. on Nov. 23, 1999 (incorporated herein by reference). As will be apparent from both the JMF API Guide and the JMF Specification, although unlike previous operating system dependent solutions, JMF is a Java-based platform independent solution, the use of JMF to provide audio data to and from a speech driven application remains a daunting task. In particular, JMF requires the speech driven application developer to specify several device-dependent parameters, for example media transport protocol, and media transport specific parameters, for example frame size and packet delay. Hence, a speech application developer using JMF must maintain an awareness of the device characteristics for the audio input and output sources.
For example, audio data transmitted in a European telephony network typically is A-law encoded. In contrast, audio data transmitted over a U.S. telephony network typically is &mgr;-law encoded. As a result, in order for a JMF-based speech driven application to handle audio data transmitted over a European telephony network, proper settings consonant with the A-law encoding of audio data must be known by the speech driven application developer and specifically applied to the speech driven application in addition to other settings such as transport protocol and packet delay. Thus, what is needed is a device-independent system, method and apparatus for linking a speech driven application to specific audio input and output devices.
SUMMARY OF THE INVENTION
The present invention is an audio abstractor that provides a device independent approach to enable a speech driven application to receive and transmit digitized speech audio to and from audio input and output devices. In particular, the audio abstractor can provide a device-independent interface to speech driven applications through which the speech driven applications can access digitized speech audio from specific audio input and output devices without having to specify device-specific parameters necessary to interact with those specific audio input and output devices. Rather, the audio abstractor can be configured to interact with specific audio input and output devices, for example through a media framework, thereby off-loading from the speech driven application the complexity of audio device configuration.
A device-independent speech audio system for transparently linking a speech driven application to specific audio input and output devices can include a media framework for transporting digitized speech audio between speech driven applications and a plurality of audio input and output devices. The media framework can include selectable device-dependent parameters which can enable the transportation of the digitized speech to and from the plurality of audio input and output devices. The device-independent speech audio system also can include an audio abstractor configurable to provide specific ones of the selectable device-dependent parameters according to the specific audio input and output devices. Hence, the audio abstractor can provide a device-independent interface to the speech driven application for linking the speech driven application to the specific audio input and output devices.
In a representative embodiment of the present invention, the device-independent speech audio system can be used in conjunction with a speech recognition system. Accordingly, the device-independent speech audio system can further include a speech recognition engine communicatively linked to the device-independent interface of the audio abstractor. In consequence, the speech recognition engine can receive the digitized speech audio from a specific audio input device via the audio abstractor without specifying the specific ones of the device-dependent parameters. Also, the speech recognition engine can convert the received digitized speech audio to computer readable text. Finally, the speech recognition engine can provide the converted computer readable text to the speech driven application.
In another representative embodiment of the present invention, the device-independent speech audio system can be used in conjunction with a text-to-speech (TTS) engine. Accordingly, the device-independent speech audio system can further include a text-to-speech (TTS) engine communicatively linked to the device-independent interface of the audio abstractor. The TTS engine can convert computer readable text received from the speech driven application into the digitized speech audio. In consequence, the TTS engine can transmit the digitized spee
Celi, Jr. Joseph
Gavagni Brett
Leontiades Leo
Lucas Bruce D.
LandOfFree
Device-independent speech audio system for linking a speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Device-independent speech audio system for linking a speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Device-independent speech audio system for linking a speech... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3170339