Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Patent
1996-03-18
1998-10-20
Hudspeth, David R.
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
704248, 704253, G10L 506, G10L 900
Patent
active
058262300
DESCRIPTION:
BRIEF SUMMARY
TECHNICAL FIELD
The invention generally relates to a device for the detection of the start and end of a segment containing speech within an input audio signal which contains both speech segments and nonspeech noise or background segments.
BACKGROUND ART
Detection of speech in real time is a necessary component for many devices, including but not limited to voice activated tape recorders, answering machines, automatic speech recognizers, and processors for removing speech from music. Many of these applications have noise inseparably mixed with speech. Detection of speech requires a more sophisticated speech detection capability than provided by conventional devices that simply detect when energy level rises above or falls below preset threshold.
In the field of automatic speech recognition, the speech detection component is most critical. In practice, more speech recognition errors arise from errors in speech detection than from errors in pattern matching, which is commonly used to determine the content of the speech signal. One proposed solution is to use a word spotting technique, in which the recognizer is always listening for a particular word. However, if word spotting is not preceded by speech detection, the overall error rate can be high.
Many speech detection devices are based on a certain parameter of the input, such as energy, pitch, and zero crossings. The performance of the speech detector depends heavily on the robustness of that parameter to background noise. For real time speech detection, the parameters must be quickly extracted from the signal.
DISCLOSURE OF INVENTION
One of the objects of the present invention is to provide a device for the detection of speech which is capable of operation at a speed fast enough to keep up with the arrival of the input, i.e., real time.
Another object of the present invention is to provide a device for the detection of speech that can be implemented with a conventional digital signal processing circuit board.
Another object of the present invention is to provide a device for the detection of speech which is effective despite various types of noise mixed with the speech.
Another object of the present invention is to provide a speech detection device for various applications, including but not limited to: isolated word automatic speech recognizers, continuous speech recognizers (to detect pauses between phrases of sentences), voice controlled tape recorders, answering machines, and the processing of voice embedded in a recording with background noise or music.
These and other objects of the invention are achieved by the provision of a device for detecting speech in an input signal which includes means for determining a value representative of the smoothed frequency band limited energy within the signal, means for determining a variance of the value representative of the smoothed frequency band limited energy of the signal, and means for determining the beginning and ending points of speech within the signal based on the variance of the smoothed frequency band limited energy and the history of the band limited energy.
The invention exploits the variance in the smoothed frequency band limited energy and the history of the smoothed frequency band limited energy to detect the beginning and end of speech within an input speech signal. Variance of the smoothed frequency band limited energy is employed based on the observation that foreground speech occurring in a difficult background, such as a lead vocalist against a background of music, yields a noticeable fluctuation of the energy level above a "noise floor" of relatively low fluctuation. This effect occurs although the level of the background may be high. Variance quantifies that fluctuation of energy.
In accordance with the preferred embodiment, the device calculates smoothed frequency band limited energy using a Hamming window and a Fourier transform. The variance is calculated as a function of time from smoothed frequency band limited energy values stored in a shift register. To determine the beginni
REFERENCES:
patent: 4441203 (1984-04-01), Fleming
patent: 5579431 (1996-11-01), Reaves
patent: 5617508 (1997-04-01), Reaves
Hudspeth David R.
Matsushita Electric - Industrial Co., Ltd.
Panasonic Technologies Inc.
Sax Robert Louis
LandOfFree
Speech detection device does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech detection device, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech detection device will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-259824