Video control of speech recognition

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S275000, C704S274000, C704S246000, C704S270000, C434S004000

Reexamination Certificate

active

06243683

ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
The present invention relates to the field of computer technology. More particularly, the present invention relates to the use of computer technology for speech recognition.
BACKGROUND OF THE INVENTION
Speech recognition has the potential to provide a significant leap in the application of computing technology. One of the barriers in the adoption of speech recognition is its inability to distinguish the relevant spoken commands intended for the computer from the otherwise irrelevant speech common throughout the day, such as passing conversations, muttering, and background conversation. As a result, most speech recognition systems require the user to continuously indicate to the computer when to start or stop listening, so that the system does not interpret speech intended for other listeners.
Humans, however, are quite adept at determining what speech is directed at them, and use a number of techniques to guide them in this, such as:
1. Specific keywords (such as our names);
2. Body contact (such as a tap on the shoulder);
3. Proximity of the noise (relative volume); and
4. Visual clues (such as establishing eye contact, or pointing while one is moving their mouth).
In order to provide speech recognition systems with a human-like level of functionality, speech user interfaces have thus far focused on the first two techniques mentioned above. For instance, analogous to item
1
above, many speech recognition engines or units provide the ability to specify an “attention phrase” to wake up the computer and a “sleep” phrase to force an end to speech recognition. Most interface paradigms also provide a “toggle to talk” button, similar to a tap on the shoulder. These approaches alone, however, have limitations. Attention words are often missed, taking considerable time to eventually turn on or off speech recognition. Toggle to talk buttons require user proximity—undermining speech's inherent advantage of operating without having to be in physical contact with the speech recognition system.
Another problem with speech recognition systems is the inability of a speech recognition system to hone in on a specific audio source location. Recent microphone array research has, however, yielded the ability to hone in on a specific audio source location, thus providing the ability to filter extraneous, irrelevant sounds from the input audio stream. For example, using two microphones, one on each side of a speech recognition system (such as on the left and right side of the monitor of a PC-based system), background noise can be eliminated by using the microphone array to audially narrow into the words emanating from the user's mouth. The speech recognition algorithm can thus obtain a much cleaner audio source to use, increasing both its accuracy and its robustness in harsh (i.e., real world) audio environments. A problem with the microphone arrays, however, is that the user rarely sits still making it difficult to determine the source point to hone in on. This is especially so when speech recognition is performed in non-traditional PC uses (such as in a living room to control a television). Worse yet, if the speech recognition is performed via a hand held pad, the microphone itself is also moving.
As described below, the present invention provides a variety of embodiments that address the limitations of speech recognition systems noted above.
SUMMARY OF THE INVENTION
In one embodiment, the present invention provides a method and apparatus for controlling the operation of a speech recognition unit using a video image to detect gestures made by a user. In another embodiment, the invention provides a method and apparatus for filtering an audio input signal in a speech recognition system using a microphone array to isolate the source of the user's voice, where the location of the user is determined using a video image. In another embodiment, the above described embodiments are combined.


REFERENCES:
patent: 4961177 (1990-10-01), Uehara
patent: 5729694 (1998-03-01), Holzrichter et al.
patent: 5890116 (1999-03-01), Itoh et al.
patent: 6023675 (2000-02-01), Bennett et al.
“Tracking Multiple Talkers Using Microphone-Array Measurements,” D. Sturim, M. Brandstein, H. Silverman; IEEE International Conference on Acoustics, Speech & Signal Processing, Apr. 1997.*
Wang et al (A Hybrid Real-Time Face Tracking System, IEEE International Conference on Acoustics, Speech & Signal Processing, May 1998).*
Wang & Chu (“Voice Source Localization for Automatic Camera Pointing System in Videoconferencing,” IEEE International Conference on Acoustics, Speech & Signal Processing, May 1997).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Video control of speech recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Video control of speech recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Video control of speech recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2515168

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.