Detection of the speech activity of a source

Telephonic communications – Substation or terminal circuitry – For loudspeaking terminal

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C379S388040, C342S423000

Reexamination Certificate

active

06707910

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a method and a device for detecting the source of a voice comprising microphone means for receiving a voice signal and detection means for the detection of the voice in the received voice signal.
BACKGROUND OF THE INVENTION
A telephone conversation is often disturbed by echo. This concerns in particular full-duplex telephones which have four different speech states: idle, near-end speech, far-end speech and double-talk. The echo occurs usually when speech is coming from the far end, when the received far end signal is reproduced in a loudspeaker and is returned to the far end through a microphone. The echo problem occurs in particular in such hands-free solutions, in which a loudspeaker reproduces the voice with high volume to the surroundings and the voice from the loudspeaker thus is easily returned to the microphone.
Adaptive signal processing is used in order to remove the echo. In a hands-free application of a mobile telephone it is possible to effectively eliminate the very disturbing acoustic feedback from the loudspeaker to the microphone—the acoustic echo—by using prior known echo cancellers and echo suppressors. An echo canceller can be realized using an adaptive digital filter which usually suppresses the echo signal from an outgoing signal, i.e. the signal which has come from the far end, when a far-end signal is present at the reception. In this way it is striven for to prevent a far-end signal from returning to the far-end. The parameters of the adaptive filter are usually updated always when far-end speech occurs in order to take into account the conditions of any situation as accurately as possible. An echo suppressor on its behalf is used to attenuate the near-end signal to be transmitted.
Such a situation in which near-end and far-end speech occur simultaneously is called a double-talk situation. During double-talk an echo canceller is not capable of effectively removing an echo signal. This is due to the fact that the echo signal is summed in the near-end signal to be transmitted, in which case the echo canceller is not capable of forming an accurate model of the echo signal to be removed. In such a case the adaptive filter of the echo canceller is not capable of adapting in a correct way to the acoustic response of the space between the loudspeaker and the microphone and accordingly is not capable of removing the acoustic echo from the signal to be transmitted, if the near-end speech signal is present. A double-talk detector is often used because of this in order to eliminate the disturbing effect of double-talk on the echo canceller. A double-talk situation is usually detected by detecting whether there is near-end speech simultaneously with far-end speech. During double-talk the parameters of the adaptive filter of the echo canceller are not updated, but the updating of the adaptive filter has to be interrupted while the near-end person speaks. Also an echo suppressor requires the information about the speech activity of the near-end speaker in order to not incorrectly attenuate (too much) the signal to be transmitted while the near-end person is speaking.
In addition to echo cancelling and -suppressing, the information about near-end speech activity is needed for the interruptable transmission used in GSM-mobile telephones. The idea of the interruptable transmission is to transmit a speech signal only during speech activity, i.e. when the near-end speaker is quiet the near-end signal is not transmitted in order to save power. In order to avoid excessive variations of background noise level due to the interruptable transmission, it is possible to transmit in the idle-state some comfort noise and still save bits needed in the transmission. In order to that the interruptable transmission of the GSM would not reduce the quality of the transmitted speech, the near-end speech activity must be detected accurately, quickly and reliably.
FIG. 1
presents prior known arrangement
1
for echo cancelling and double-talk detection. Near-end signal
3
comes from microphone
2
and it is detected using near-end speech activity detector
4
, VAD (Voice Activity Detector). Far-end signal
5
comes from input connection I (which can be the input connector of a hands-free equipment, the wire connector of a fixed telephone and in mobile telephones the path from an antenna to the reception branch of the telephone) and it is detected in far-end speech activity detector
6
, a VAD, and finally it is reproduced with loudspeaker
7
. Both near-end signal
3
and far-end signal
5
are fed to double-talk detector
8
for the detection of double-talk and to adaptive filter
9
for adapting to the acoustic response of echo path
13
. Adaptive filter
9
gets as an input also the output of double-talk detector
8
, in order to not adapt (parameters are not updated) the filter during double-talk. Model
10
formed by the adaptive filter is subtracted from near-end signal
3
in summing/subtracting unit
11
in order to perform the echo cancelling. To output connection O (which can be the output connector of a hands-free equipment, the wire connector of a fixed telephone and in mobile telephones the path through transmission branch to antenna) it is brought echo canceller output signal
12
, from which some (of the) echo has been cancelled. It is possible to realize the echo canceller presented in
FIG. 1
integrated in a telephone (comprising for example a loudspeaker and microphone for hands-free loudspeaker call) or in a separate hands-free equipment.
Several methods for the detection of double-talk have been presented. Many of these however are very simple and partly unreliable. Most double-talk detectors are based upon the power ratios between loudspeaker signal and/or microphone signal and/or the signal after an echo canceller. The advantages of these detectors are simplicity and quickness, their disadvantage is the unreliability.
Detectors based upon the correlation between a loudspeaker signal and/or microphone signal and/or the signal after an echo canceller are also prior known. These detectors are based upon an idea, according to which a loudspeaker and a mere echo signal in a microphone (the signal after an echo canceller) are strongly correlated, but when a near-end signal is summed in the microphone signal the correlation is reduced. The disadvantage of these detectors are slowness, the (partly incorrect) assumption of the non-correlation between near-end and far-end signals, and the effects of the changes on a loudspeaker signal caused by the echo path: a reduced correlation also with absent near-end signal.
It is also prior known a double-talk detector based upon the comparison of the autocorrelation of the same signals, according to which the detector recognizes the voice in a near-end signal and thus can detect the presence of the near-end signal. Such a detector has less calculation power, but it suffers from the same problems as the detectors based upon correlation.
In publication Kuo S. M., Pan Z., “Acoustic Echo cancellation Microphone System for Large-Scale Video Conferencing”, Proceedings of ICSPAT'94, pp. 7-12, 1994 it has been utilized two microphones directed to opposite directions for the removing of noise and acoustic echo and for the recognizing of the different speech situations mentioned in the beginning. The method in question does however not bring any particular improvement in the recognizing of double-talk, which is performed merely according to the output power of the echo canceller.
In publication Affes S., Grenier Y., “A Source subspace Tracking array of Microphones for Double-talk Situations”, Proceedings of ICSPAT'96, Vol. 2, pp. 909-912, 1996, it has been presented an echo and background noise-canceller of microphone vector structure. The presented echo canceller filters signals coming from a spatially chosen direction maintaining the signals coming from a desired direction. The echo canceller in question is capable of operating also during double-talk situations. However, the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Detection of the speech activity of a source does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Detection of the speech activity of a source, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Detection of the speech activity of a source will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3224962

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.