Method and apparatus for detecting voice activity in a...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000, C704S207000

Reexamination Certificate

active

06188981

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of speech coding in communication systems, and more particularly to detecting voice activity in a communications system.
2. Description of Related Art
Modern communication systems rely heavily on digital speech processing in general, and digital speech compression in particular, in order to provide efficient systems. Examples of such communication systems are digital telephony trunks, voice mail, voice annotation, answering machines, digital voice over data links, etc.
A speech communication system is typically comprised of an encoder, a communication channel and a decoder. At one end of a communications link, the speech encoder converts a speech signal which has been digitized into a bit-stream. The bit-stream is transmitted over the communication channel (which can be a storage medium), and is converted again into a digitized speech signal by the decoder at the other end of the communications link.
The ratio between the number of bits needed for the representation of the digitized speech signal and the number of bits in the bit-stream is the compression ratio. A compression ratio of 12 to 16 is presently achievable, while still maintaining a high quality reconstructed speech signal.
A significant portion of normal speech is comprised of silence, up to an average of 60% during a two-way conversation. During silence, the speech input device, such as a microphone, picks up the environment or background noise. The noise level and characteristics can vary considerably, from a quiet room to a noisy street or a fast moving car. However, most of the noise sources carry less information than the speech signal and hence a higher compression ratio is achievable during the silence periods. In the following description, speech will be denoted as “active-voice” and silence or background noise will be denoted as “non-active-voice”.
The above discussion leads to the concept of dual-mode speech coding schemes, which are usually also variable-rate coding schemes. The active-voice and the non-active voice signals are coded differently in order to improve the system efficiency, thus providing two different modes of speech coding. The different modes of the input signal (active-voice or non-active-voice) are determined by a signal classifier, which can operate external to, or within, the speech encoder. The coding scheme employed for the non-active-voice signal uses less bits and results in an overall higher average compression ratio than the coding scheme employed for the active-voice signal. The classifier output is binary, and is commonly called a “voicing decision.” The classifier is also commonly referred to as a Voice Activity Detector (“VAD”).
A schematic representation of a speech communication system which employs a VAD for a higher compression rate is depicted in FIG.
1
. The input to the speech encoder
110
is the digitized incoming speech signal
105
. For each frame of a digitized incoming speech signal the VAD
125
provides the voicing decision
140
, which is used as a switch
145
between the active-voice encoder
120
and the non-active-voice encoder
115
. Either the active-voice bit-stream
135
or the non-active-voice bit-stream
130
, together with the voicing decision
140
are transmitted through the communication channel
150
. At the speech decoder
155
the voicing decision is used in the switch
160
to select the non-active-voice decoder
165
or the active-voice decoder
170
. For each frame, the output of either decoders is used as the reconstructed speech
175
.
An example of a method and apparatus which employs such a dual-mode system is disclosed in U.S. Pat. No. 5,774,849, commonly assigned to the present assignee and herein incorporated by reference. According to U.S. Pat. No. 5,774,849, four parameters are disclosed which may be used to make the voicing decision. Specifically, the full band energy, the frame low-band energy, a set of parameters called Line Spectral Frequencies (“LSF”) and the frame zero crossing rate are compared to a long-term average of the noise signal. While this algorithm provides satisfactory results for many applications, the present inventors have determined that a modified decision algorithm can provide improved performance over the prior art voicing decision algorithms.
SUMMARY OF THE INVENTION
A method and apparatus for generating frame voicing decisions for an incoming speech signal having periods of active voice and non-active voice for a speech encoder in a speech communications system. A predetermined set of parameters is extracted from the incoming speech signal, including a pitch gain and a pitch lag. A frame voicing decision is made for each frame of the incoming speech signal according to values calculated from the extracted parameters. The predetermined set of parameters further includes a frame full band energy, and a set of spectral parameters called Line Spectral Frequencies (LSF).


REFERENCES:
patent: 5664055 (1997-09-01), Kroon
patent: 5732389 (1998-03-01), Kroon et al.
patent: 5737716 (1998-04-01), Bergstrom et al.
patent: 5774849 (1998-06-01), Benyassine
patent: 0 785 541 A2 (1997-01-01), None
patent: 0 785 419 A2 (1997-07-01), None
patent: 0 784 311 A1 (1997-07-01), None
A. Benyassine, E. Sholomot, S. Huan-Yu & E. Yuen, “A Robust Low Complexity Voice Activity Detection Algorithm for Speech Communication Systems”, IEEE Workshop on Speech Coding for Telecommunications Proceedings, Sep. 10, 1997.
L. Siegel & A. Bessey, “Voiced/Unvoiced/Mixed Excitation Classification of Speech,” IEEE Transactions on Acoustics, Speech and Signal Processing, Jun. 1982.
Y. Ephraim, “On minimum mean-square error speech enhancement”, International Conference on Acoustics, Speech and Signal Processing, IEEE, Apr. 1991.
Y. Ephraim, R.M. Gray, “A unified approach for encoding clean and noisy sources by means of waveform and autoregressive model vector quantization,” Transactions on Information Theory, IEEE, Jul. 1998.
Discrete-Time Processing of Speech Signals, by John R. Deller, Jr., et al, pp. 327-329 (1987).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for detecting voice activity in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for detecting voice activity in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for detecting voice activity in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2583480

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.