Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2002-03-28
2002-12-03
Banks-Harold, Marsha D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S228000
Reexamination Certificate
active
06490554
ABSTRACT:
BACKGROUND OF INVENTION
1. Field of the Invention
The present invention relates to a voice activity detecting device for discriminating between an active voice segment and a non-active voice segment of the aural signal, and it also relates to a voice activity detecting method which is applied to the voice activity detecting device.
2. Description of the Related Art
In recent years, digital signal processing technologies have been highly progressed, and in a mobile communication system and other communication systems, these digital signal processing technologies are applied to perform various kinds of real time signal processing for an aural signal which is transmission information.
Furthermore, at a transmitting end of a communication system like the above, a voice activity detecting device for detecting an active voice segment and a non-active voice segment of the aforesaid aural signal and for allowing transmission to a transmission channel only in this active voice segment is mounted for the purpose of achieving compression of a transmission band and effective utilization of a radio frequency and saving power consumption.
FIG. 12
is a block diagram showing a configuration example of a radio terminal equipment in which the voice activity detecting device is mounted.
In
FIG. 12
, a microphone
41
is connected to an input of a voice activity detecting device
42
and a modulation input of a receiving/transmitting part
43
, and a feeding point of an antenna
44
is connected to an antenna terminal of this receiving/transmitting part
43
. An output of the voice activity detecting device
42
is connected to a transmission control input of the receiving/transmitting part
43
, and to a control input/output of this receiving/transmitting part
43
, a corresponding input/output port of a controlling part
45
is connected. A specific output port of the controlling part
45
is connected to a control input of the voice activity detecting device
42
and a demodulation output of the receiving/transmitting part
43
is connected to an input of a receiver
46
.
In the radio terminal equipment as configured above, the receiving/transmitting part
43
radio-interfaces aural signals, which are transmission information to be transmitted/received via the microphone
41
and the receiver
46
, with a radio transmission channel (not shown) which is accessible via the antenna
44
.
The controlling part
45
plays a leading role in channel control which is required for forming this radio transmission channel by operating in association with the receiving/transmitting part
43
.
The voice activity detecting device
42
samples the aforesaid aural signals at a predetermined cycle to generate a sequence of active voice frames. Moreover, the voice activity detecting device
42
discriminates, based on the characteristic of the aural signal, which of an active voice segment and a non-active voice segment each of the active voice frames corresponds to, and outputs a binary signal indicating the result of the discrimination.
Note that the aforesaid characteristic includes, for example, the following items. having a dynamic range of approximately 55 decibel Amplitude distribution can be approximated to by a standard probability density function. Values of energy density and a zero crossing frequency in the active voice segment are different from those in the non-active voice segment respectively.
The receiving/transmitting part
43
refrains from transmitting during a period when a logical value of the binary signal indicates the aforesaid non-active voice segment.
Therefore, unwanted transmission by the receiving/transmission part
43
is restricted during a period when any available information is not included as transmission information in the aural signal. Consequently, suppression of interference with other radio channel and effective utilization of a radio frequency as well as reduction in power consumption can be realized.
In the conventional example as described above, however, a difference in a feature value (for example, the aforesaid zero crossing frequency) between in the active voice segment and in the non-active voice segment becomes small during a period when noise of a high level is superimposed on the aural signal which is given via the microphone
41
.
Furthermore, even in the active voice segment, amplitude of the aural signal is generally distributed more at small values compared with that in a vowel segment when it is a consonant segment.
Therefore, it is highly possible that the consonant segment is discriminated as the non-active voice segment, so that a corresponding active voice frame is not transmitted in the consonant (active voice) segment which has been mistakenly discriminated as explained above, which is very likely to cause unwanted deterioration in speech quality.
Furthermore, when the level of the aforesaid noise is excessively high, there is a possibility that transmission of the whole active voice frame which corresponds to most part of the aural signal on which the noise is superimposed is restricted.
Incidentally, these problems can be solved, for example, when a threshold value for the feature value or the like which serves as the basis of the discrimination is set at such a value to cause the active voice frame to be easily discriminated as the active voice segment.
When the threshold value as mentioned above is applied, however, the probability is increased that the active voice frame is discriminated as the active voice segment even though it corresponds to the non-active voice segment and an hour rate of the active voice segment may possibly become excessively high, so that there is a possibility that reduction in power consumption, suppression of interference, and effective utilization of a radio frequency as stated above cannot be fully realized.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a voice activity detecting device which is flexibly adaptable to various features of an aural signal and to noise be superimposed on the aural signal and is capable of discriminating between an active voice segment and a non-active voice segment with high accuracy, and also to provide a voice activity detecting method.
It is another object of the present invention that even when an active voice segment includes many segments such as a consonant segment in which the quality of an aural signal is low because of its low amplitude, the segments are determined as a part of an active voice segment with high reliability.
It is still another object of the present invention to determine each active voice frame as a part of an active voice segment with high accuracy.
It is yet another object of the present invention to reduce required throughput or enhance responsiveness.
It is yet another object of the present invention to determine even active voice frames having noise of a high level superimposed on and a low SN ratio as a part of an active voice segment with high accuracy.
It is yet another object of the present invention that communication equipments and other electronic equipments to which the invention is applied, are able to flexibly adapt to an acoustic environment in which an acousto-electric converting section for generating an aural signal is disposed, or to a characteristic and performance of an information source of the active voice signal, and they are able to discriminate between an active voice segment and a non-active voice segment of this aural signal with high reliability so that desired performance suitable for the discrimination result and effective utilization of resources can be achieved.
The above-described objects are achieved by a voice activity detecting device and a voice activity detecting method which are characterized in that a probability that an active voice frame belongs to an active voice segment, and the quality of the active voice frame are determined on an active-voice-frame basis, and the probability is weighted with the quality to output the resultant.
According to the voice activity detecting device and
Endo Kaori
Ota Yasuji
Banks-Harold Marsha D.
Fujitsu Limited
Katten Muchin Zavis & Rosenman
Lerner Martin
LandOfFree
Speech detecting device and speech detecting method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech detecting device and speech detecting method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech detecting device and speech detecting method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2935974