Automatic gain control in a speech recognition system

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Automatic gain control in a speech recognition system Automatic gain control in a speech recognition system

: 1998-11-06
: 2001-11-06
: Knepper, David D. (Department: 2645)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Recognition

: C704S234000
: Reexamination Certificate
: active
: 06314396
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to speech recognition systems. More particularly, the invention is directed to a system and method for normalizing a voice signal in a speech pre-processor for input to a speech recognition system.
2. Discussion of the Prior Art
A well recognized goal of speech recognition systems (hereinafter SR systems) is that of normalizing the voice signal to be processed, including its energy. Normalizing a voice signal enables successful comparison of the unknown spoken information with stored patterns or models. The process of energy normalization generally involves removing the long term variations and bias in the energy of the voice signal while retaining the short term variations that represent the phonetic information. The process of energy normalization enhances the accuracy of the SR system in proportion to the degree of normalization applied.
The undesirable long term variations in the energy of a voice signal can typically arise from multiple sources. A common source of energy variation comes from variations in microphone gain and placement. Current SR systems are very sensitive to variations in both the microphone gain and placement. Improper gain and/or placement result in higher error rates. At present, the only way to accommodate the SR system is to use an offline microphone setup to set the gain. This, however, presents several disadvantages. First, it is an added burden on the user. Second, it does not measure the audio quality on-line, and so does not detect changes that happened since the setup. Third, it does not measure the feature that is most relevant to the SR system: the instantaneous signal to noise ratio.
Additional contributing factors to energy variation, a which leads to higher error rates, include the intensity of a speaker's voice which will typically exhibit a large dynamic range. A further general problem is that different speakers will have different volume levels. Thus, the variations in amplitude or energy that occur between different utterances of the same word or sentence by different speakers, or even by the same speaker at different times, must be eliminated or at least reduced.
In the prior art, hardware solutions in the form of automatic gain controls have been used on sound cards to achieve energy normalization of raw signals. However, the degree of normalization provided by such cards has proven to be inadequate for the purposes of speech recognition.
The use of an unbiased mean value has also been used in the prior art, however, since the relative amounts of speech, silence, and noise contained within the signal is not known in advance an unbiased mean value is not a reliable norm. The peak value of the energy provides a more reliable norm, however, there is an associated drawback in tracking peak energy in that the system may suffer from being too sensitive to the instantaneous variations in energy. It is therefore desirable to have a reliable indicator of peak energy without being overly sensitive to peak energy variations.
A further general problem associated with energy normalization is that of silence detection. The signal energy is not a good indicator of silent periods because of background static. Static on one system could be at the level of speech on another system. Having no control over the sound cards and microphones that are used, it is therefore desirable to have some alternate measure of the silence level.
SUMMARY OF THE INVENTION
It is therefore a general object of the present invention to devise a system and method of speech signal normalization for eliminating or reducing variations in signal energy prior to a speech recognition process.
It is a particular object of the present invention to provide a method which can perform energy normalization of voice signals.
It is a further object of the present invention to discriminate between passages of silence and speech in a voice signal.
It is a still further object of the present invention to give the user of an SR system, feedback means for achieving optimal speech recognition accuracy.
It is yet a further object of the present invention to provide a method for normalizing a voice signal which can rapidly respond to the energy variations in a speaker's utterance.
It is a further object of the present invention to provide a method that will not distort the tracking of energy peaks during periods of silence.
These objects and other advantages are achieved by a voice normalization method and device which normalizes the energy in a voice signal by removing the long term variations and bias in the energy of the signal while retaining the short term variations that represent the phonetic information. A method of the present invention constructs a plurality of distinct energy tracks, preferably a high, mid, and low energy track. The values are mathematically smoothed and then used to track the upper and lower energy envelope of the voice signal. The smoothed high energy track is used to normalize the energy in the signal. The method further derives two figures of merit; a signal to noise ratio, and a measure of the noise floor, both of which are used to provide feedback means to a user of the SR system.
The present invention further provides a novel method for discriminating between periods of silence and speech in the voice signal.
In the preprocessor, there is provided an analyzer connected to receive a digital speech signal and generate therefrom a sequence of frames. Each frame has a plurality of samples from said digital speech signal. A device is further provided for tracking the energy in one or more consecutive frames of said digital speech signal by constructing a plurality of energy tracks to track the upper energy envelope, an average energy, and a lower energy envelope. Another device is yet further provided for calculating a value of normalized energy. Another device is yet further provided for providing the normalized energy to a speech recognition preprocessor. Another device is yet further provided for measuring the signal to noise ratio and absolute noise floor in one or more consecutive frames of said digital speech signal. Another device is yet further provided for displaying said signal to noise ratio and absolute noise floor to a user as a continuous display for providing feedback means to achieve optimal speech recognition accuracy. And, another device is yet further provided for discriminating between intervals of silence and speech in said digital speech signal.
According to the present invention, there is provided a method of normalizing energy in a voice signal. The method calculates a high energy track for tracking the upper energy envelope of said voice signal. The method further calculates a low energy track for tracking the lower energy envelope of said voice signal. The method yet further calculates a mid energy track for tracking the average energy of said voice signal. And, the method yet further calculates a value of normalized energy from said high energy track to be provided to a speech recognition system.
The invention and its operation will become more apparent from the following description of preferred embodiments and from the accompanying drawings.

REFERENCES:
patent: 4028496 (1977-06-01), LaMarche et al.
patent: 4277645 (1981-07-01), May, Jr.
patent: 4331837 (1982-05-01), Soumagne
patent: 4696039 (1987-09-01), Doddington
patent: 4696040 (1987-09-01), Doddington et al.
patent: 4807167 (1989-02-01), Green, Jr.
patent: 4817158 (1989-03-01), Picheny
patent: 5195138 (1993-03-01), Kane et al.
patent: 5598505 (1997-01-01), Austin et al.
patent: 5689615 (1997-11-01), Benyassine et al.
patent: 5937375 (1999-08-01), Nakamura
patent: 6076057 (2000-06-01), Narayanan et al.
patent: 62290258 (1987-12-01), None
patent: 253369 (1990-02-01), None
patent: 2189061 (1990-07-01), None
patent: 329555 (1991-02-01), None

Affiliated with

Monkowski Michael D.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

F. Chau & Associates LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

International Business Machines - Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Knepper David D.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Automatic gain control in a speech recognition system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic gain control in a speech recognition system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic gain control in a speech recognition system will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2613753

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure