Cancellation of loudspeaker words in speech recognition

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S236000

Reexamination Certificate

active

06725193

ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates to a method and apparatus for voice telecommunication, and more particularly to a method and apparatus in which incoming voice signals output by a speaker may be canceled from an outgoing voice signal to be used for speech recognition.
In a conventional communication system such as a land-based telephone system, speech spoken into a remote telephone is picked up by a microphone in the telephone and converted into an incoming audio analog signal (relative to the receiving telephone). The incoming audio signal is sent down an incoming line and eventually to an amplifier connected to a speaker in the receiving telephone. The amplifier amplifies the signal and the speaker converts the amplified signal into sound waves that are heard by a person at the receiving telephone. The person can respond by speaking into a microphone in the receiving telephone. The microphone is operably connected to an outgoing line and converts the words of the telephone user into an outgoing audio signal sent down an outgoing line and ultimately onward, generally to a speaker in the remote telephone.
A land based communication system, with speech recognition, typically has a far end and a near end with a remote microphone/speaker unit located at the far end and a local microphone/speaker unit located at the near end. A landline connects the remote and local microphone/speaker units. The landline has an incoming line (relative to the local microphone/speaker unit) that connects the remote microphone with the local speaker; and an outgoing line (relative to the local microphone/speaker unit) that connects the local microphone with the remote speaker. A speech recognition unit is usually operably attached to the outgoing line carrying the outgoing audio signal from the local microphone at the near end to the remote speaker at the far end. Words spoken by a person at the near end, in response to the output from the local speaker, are received by the local microphone and converted into an outgoing audio analog signal that travels along the outgoing line from the local microphone to the remote speaker at the far end. The speech recognition unit converts the outgoing audio analog signal into words. Problems can arise when the local microphone picks up words other than words spoken by the near end person. For example, speech from the local speaker might be picked up by the local microphone along with speech from the near end person, and produce a mixed outgoing audio analog signal containing speech from the near end speaker and the near end person. A speech recognition unit “listening” to the outgoing microphone signal may not differentiate between the two. For example, where a remote system generates an audio command such as “Please type seven to delete message” and that command is output on the near end speaker, the words “seven” and “delete” may well be picked up by the microphone and carried in the outgoing signal which, when received and processed by the speech recognition unit, could cause a message to be deleted even though the near end user did nothing.
An echo suppressor has been operably attached to the incoming and outgoing lines of the communication system to improve the operation of the speech recognition unit. The echo canceller is used to suppress words picked up by the microphone from the loudspeaker. Voice recognition units should only receive words spoken by the near end user and picked up by the microphone, but suppression of the loudspeaker words by means of the echo canceller can leave a residual echo in the outgoing line along with genuine outgoing signal (i.e. words spoken by the user) and result in a mixes outgoing signal. The speech recognition unit might fail to differentiate bet ween the genuine words spoken by the user and unwanted output from the speaker. In this type of scenario, the speech recognition would incorrectly attribute words from the speaker as words spoken by the near end user.
Alternatively, communication systems have been configured to disable the microphone when the loudspeaker is producing output. However, this solution does not allow for a user interrupting or “cutting through” a voice prompt outputted from the speaker. For example, the microphone would not clearly pick up a user's response when the user interrupts a voice prompt such as, “Speak your login ID.” The user would have to always remember to wait for each verbal prompt to complete before responding.
SUMMARY OF THE INVENTION
In one aspect of the present invention, a voice recognition system is provided for use with a communication system having an incoming line and an outgoing line, the incoming line carrying an incoming signal from a first end to a second end operably attached to an audio output responsive to the incoming signal and the outgoing line carrying an outgoing signal from a second end to a first end, the outgoing line second end being attached to a microphone near the audio output. The voice recognition system includes a first speech recognition unit for detecting an incoming word in the incoming signal, a second speech recognition unit for detecting an outgoing word in the outgoing signal, and a comparator/signal generator operably connected to the first and the second speech recognition units. The comparator/signal generator compares the outgoing word with the incoming word and outputs the outgoing word when the outgoing word does not match the incoming word.
In other aspects of the invention, the first speech recognition unit may be delayed relative to the second speech recognition unit so as to search for a word in the incoming signal corresponding to the outgoing word detected by the second speech recognition unit during the delay. Further, the speech recognition units may search only for selected words, or may ignore words which are first detected by the other speech recognition unit. The speech recognition units may use templates to search only for selected words, and those templates may be trained by the voice prompt system and/or by the user, either as speaker independent or speaker dependent.
In still another aspect of the invention, a signaler may provide a signal indicating inclusion of one of the command words in the known incoming signal with a speech recognition unit responsive to that signal to ignore the included one command word in the template for a selected period of time, where a signal generator operably connected to speech recognition unit generates commands responsive to detection of one of the selected command words by the speech recognition unit.


REFERENCES:
patent: 5475791 (1995-12-01), Schalk et al.
patent: 5548681 (1996-08-01), Gleaves et al.
patent: 5758318 (1998-05-01), Kojima et al.
patent: 5864804 (1999-01-01), Kalveram
patent: 5937379 (1999-08-01), Takagi
patent: 5978763 (1999-11-01), Bridges
patent: 6275797 (2001-08-01), Randic
patent: 6606595 (2003-08-01), Chengalvarayan et al.
patent: 6651043 (2003-11-01), Ammicht et al.
patent: 6665645 (2003-12-01), Ibaraki et al.
patent: WO 95 05655 (1995-02-01), None

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cancellation of loudspeaker words in speech recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cancellation of loudspeaker words in speech recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cancellation of loudspeaker words in speech recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3196222

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.