Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1999-05-21
2003-06-24
Banks-Harold, Marsha D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S275000
Reexamination Certificate
active
06584439
ABSTRACT:
MICROFICHE APPENDIX
This application contains a microfice appendix consisting of 1 sheet and 72 frames, which is not printed herewith entitled “ISD-SR 300, Embedded Speech Recognition Processor” by Information Storage Devices, Inc. which is hereby incorporated by reference, verbatim and with the same effect as though it were fully and completely set forth herein.
FIELD OF THE INVENTION
This invention relates generally to machine interfaces. More particularly, the invention relates to voice user interfaces for devices.
BACKGROUND OF THE INVENTION
Graphical user interfaces (GUIs) for computers are well known. GUIs provide an intuitive and consistent manner for human interaction with computers. Generally, once a person learns how to use a particular GUI, they can operate any computer or device which operates using the same or similar GUI. Examples of popular GUIs are MAC OS by Apple, and MS Windows by Microsoft. GUIs are now being ported to other devices. For example, the MS Windows GUI has been ported from computers to palm tops, personal organizers, and other devices so that there is a common GUI amongst a number of differing devices. However, as the name implies, GUIs require at least some sort of visual or graphical display and an input device such as a keyboard, mouse, touch pad or touch screen. The displays and the input devices tend to utilize space in an device, require additional components and increase the costs of an device. Thus, it is desirable to eliminate the display and input devices from devices to save costs.
Recently, voice user interfaces (VUIs) have been introduced that utilize speech recognition methods to control a device. However, these prior art VUIs have a number of shortcomings that prohibit them from being universally utilized in all devices. Prior art VUIs are usually difficult to use. Prior art VUIs usually require some sort of display device such as an LCD, or require a manual input device such as keypads or buttons, or require both a display and a manual input device. Additionally, prior art VUIs usually are proprietary and restricted in use to a single make or model of hardware device, or a single type of software application. They usually are not widely available, unlike computer operating systems, and accordingly software programmers can not write applications that operate with the VUI in a variety of device types. Commands associated with prior art VUIs are usually customized for that single type of device or software application. Prior art VUIs usually have additional limitations in supporting multiple users such as how to handle personalization and security. Furthermore, prior art VUIs require that a user know of the existence of the device in advance. Prior art VUIs have not provided ways of determining the presence of devices. Additionally, prior art VUIs usually require a user to read instruction manuals or screen displayed commands to become trained in their use. Prior art VUIs usually do not include audible methods for a user to learn commands. Furthermore, a user may be required to learn how to use multiple prior art VUIs when utilizing multiple voice controlled devices due to a lack of standardization.
Generally, devices controlled by VUIs continue to require some sort of manual control of functions. With some manual control required, a manual input device such as a button, keypad or a set of buttons or keypads is provided. To assure proper manual entry, a display device such as an LCD, LED, or other graphics display device may be provided. For example, many voice activated telephones require that telephone numbers be stored manually. In this case a numeric keypad is usually provided for manual entry. An LCD is usually included to assure proper manual entry and to display the status of the device. A speech synthesis or voice feedback system may be absent from these devices. The addition of buttons and display devices increases the manufacturing cost of devices. It is desirable to be able to eliminate all manual input and display from devices in order to decrease costs. Furthermore, it is more convenient to remotely control devices without requiring specific buttons or displays.
Previously, devices were used by few. Additionally they used near field microphones to listen locally for voices. Many prior devices were fixed in some manner or not readily portable or were server based systems. It is desirable to provide voice control capability for portable devices. It is desirable to provide either near field or far field microphone technology in voice controlled devices. It is desirable to provide low cost voice control capability such that it is included in more devices. However, these desires raise a problem when multiple users of multiple voice controlled devices are in the same area. With multiple users and multiple voice controlled devices within audible range of each other, it makes it difficult for voice controlled devices to discern which user to accept commands from and respond to. For example, consider the case of voice controlled cell phones where one user in an environment of multiple users wants to call home. The user issues a voice activated call home command. If more than one voice controlled cell phone audibly hears the call home command, multiple voice controlled cell phones may respond and start dialing a home telephone number. Previously this was not as significant a problem because there were few voice controlled devices.
Some voice controlled devices are speaker dependent. Speaker dependency refers to a voice controlled device that requires training by a specific user before it may be used with that user. A speaker dependent voice controlled device listens for tonal qualities in how phrases are spoken. Speaker dependent voice controlled devices do not lend themselves to applications where multiple users or speakers are required to use the voice controlled device. This is because they fail to efficiently recognize speech from users that they have not been trained by. It is desirable to provide speaker independent voice controlled devices with a VUI requiring little or no training in order to recognize speech from any user.
In order to achieve high accuracy speech recognition it is important that a voice controlled device avoid responding to speech that isn't directed to it. That is, voice controlled devices should not respond to background conversation, to noises, or to commands to other voice controlled devices. However, filtering out background sounds must not be so effective that it also prevents recognition of speech directed to the voice controlled device. Finding the right mix of rejection of background sounds and recognition of speech directed to a voice controlled device is particularly challenging in speaker-independent systems. In speaker-independent systems, the voice controlled device must be able to respond to a wide range of voices, and therefore can not use a highly restrictive filter for background sounds. In contrast, a speaker-dependent system need only listen for a particular person's voice, and thus can employ a more stringent filter for background sounds. Despite this advantage in speaker dependent systems, filtering out background sounds is still a significant challenge.
In some prior art systems, background conversation has been filtered out by having a user physically press a button in order to activate speech recognition. The disadvantage of this approach is that it requires the user to interact with the voice controlled device physically, rather than strictly by voice or speech. One of the potential advantages of voice controlled devices is that they offer the promise of true hands-free operation. Elimination of the need to press a button to activate speech recognition would go a long way to making this hands-free objective achievable.
Additionally, in locations with a number of people talking, a voice controlled device should disregard all speech unless it is directed to it. For example, if a person says to another person “I'll call John”, the cellphone in his pocket should not
Barel Avraham
Bootsma Karin Lissette
Brown Amos
Gaddy Lawrence Kent
Geilhufe Michael
Banks-Harold Marsha D.
Blakely , Sokoloff, Taylor & Zafman LLP
Lerner Martin
Winbond Electronics Corporation
LandOfFree
Method and apparatus for controlling voice controlled devices does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for controlling voice controlled devices, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for controlling voice controlled devices will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3138925