Speech detection for noisy conditions

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S214000, C704S233000

Reexamination Certificate

active

06480823

ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to speech processing and speech recognizing systems. More particularly, the invention relates to a detection system for detecting the beginning and ending of speech within an input signal.
Automated speech processing, for speech recognition and for other purposes, is currently one of the most challenging tasks a computer can perform. Speech recognition, for example, employs a highly complex pattern-matching technology that can be very sensitive to variability. In consumer applications, recognition systems need to be able to handle a diverse range of different speakers and need to operate under widely varying environmental conditions. The presence of extraneous signals and noise can greatly degrade recognition quality and speech-processing performance.
Most automated speech recognition systems work by first modeling patterns of sound and then using those patterns to identify phonemes, letters, and ultimately words. For accurate recognition, it is very important to exclude any extraneous sounds (noise) that precede or follow the actual speech. There are some known techniques that attempt to detect the beginning and ending of speech, although there still is considerable room for improvement.
The present invention divides the incoming signal into frequency bands, each band representing a different range of frequencies. The short-term energy within each band is then compared with a plurality of thresholds and the results of the comparison are used to drive a state machine that switches from a “speech absent” state to a “speech present” state when the band-limited signal energy of at least one of the bands is above at least one of its associated thresholds. The state machine similarly switches from a “speech present” state to a “speech absent” state when the band-limited signal energy of at least one of the bands is below at least one of its associated thresholds. The system also includes a partial speech detection mechanism based on an assumed “silence segment” prior to the actual beginning of speech.
A histogram data structure accumulates long-term data concerning the mean and variance of energy within the frequency bands, and this information is used to adjust adaptive thresholds. The frequency bands are allocated based on noise characteristics. The histogram representation affords strong discrimination between speech signal, silence and noise, respectively. Within the speech signal itself, the silence part (with only background noise) typically dominates, and it is reflected strongly on the histogram. Background noise, being comparatively constant, shows up as noticeable spikes on the histogram.
The system is well adapted to detecting speech in noisy conditions and it will detect both the beginning and end of speech as well as handling situations where the beginning of speech may have been lost through truncation.
For a more complete understanding of the invention, its objects and advantages, reference may be had to the following specification and to the accompanying drawings.


REFERENCES:
patent: 4032711 (1977-06-01), Sambur
patent: 4052568 (1977-10-01), Jankowski
patent: 4357491 (1982-11-01), Daaboul et al.
patent: 4401849 (1983-08-01), Ichikawa et al.
patent: 4410763 (1983-10-01), Strawczynski et al.
patent: 4433435 (1984-02-01), David
patent: 4531228 (1985-07-01), Noso et al.
patent: 4535473 (1985-08-01), Sakata
patent: 4552996 (1985-11-01), de Bergh
patent: RE32172 (1986-06-01), Johnston et al.
patent: 4627091 (1986-12-01), Fedele
patent: 4630304 (1986-12-01), Borth et al.
patent: 4696041 (1987-09-01), Sakata
patent: 4718097 (1988-01-01), Uenoyama
patent: 4815136 (1989-03-01), Benvenuto
patent: 5151940 (1992-09-01), Okazaki et al.
patent: 5222147 (1993-06-01), Koyama
patent: 5305422 (1994-04-01), Janqua
patent: 5313531 (1994-05-01), Jackson
patent: 5323337 (1994-06-01), Wilson et al.
patent: 5479560 (1995-12-01), Mekata
patent: 5579431 (1996-11-01), Reaves
patent: 5617508 (1997-04-01), Reaves
patent: 5649055 (1997-07-01), Gupta et al.
patent: 6038532 (2000-03-01), Kane et al.
patent: 6266633 (2001-07-01), Higgins et al.
patent: A2 0 322 797 (1989-07-01), None
patent: WO 86/00133 (1986-01-01), None
IBM Technical Disclosure Bulletin; Dynamic Adjustment of Silence/Speech Threshold in varying Noise conditions. vol. 37, pp. 329-330; Jun. 1, 1994.*
Lori F. Lamel, et al, “An Improved Endpoint Detector for Isolated Word Regognition”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-29, No. 4, Aug. 1981.
A. Acero et al., Robust HMM-Based Endpoint Detector, 1993, 1551-1554.
M. Rangoussi et al., Robust Endpoint Detection of Speech in the Presence of Noise, 1993, 649-651.
J. G. Wilpon et al., Application of Hidden Markov Models to Automatic Speech Endpoint Detection, 1987, 321-341.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech detection for noisy conditions does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech detection for noisy conditions, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech detection for noisy conditions will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2932645

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.