Method for dynamic adjustment of audio input gain in a...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C381S107000

Reexamination Certificate

active

06651040

ABSTRACT:

CROSS REFERENCE TO RELATED APPLICATIONS
(Not Applicable)
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
(Not Applicable)
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to the field of speech systems and more particularly to a method and apparatus for dynamically adjusting audio input gain according to conditions sensed in an audio input signal to a speech system.
2. Description of the Related Art
Speech systems are systems which can receive an analog audio input signal representative of speech and subsequently digitize and process the audio input signal into a digitized speech signal. Speech signals, unlike general audio signals, contain both speech data and silence data. That is, in any given sample of audio data representative of speech, a portion of the signal actually represents speech while other portions of the signal represent background noise and silence. Hence, in performing digital processing on an audio signal, a speech system must be able to differentiate between speech data and background and silence data. Accordingly, speech systems can be sensitive to the quality of an audio input signal in performing this necessary differentiation.
The quality of an audio input signal can be particularly apparent in a handheld, portable speech system. Specifically, users of portable speech systems often provide speech input to the speech system in varying environmental conditions. For example, a user of a portable speech system can dictate speech in car, in an office, at home in front of the television, in a restaurant, or even outside. Consequently, many environmental factors can affect the quality of speech input. When in a car, interior cabin noise can be included in the speech signal. When in an office, a ringing telephone can be included in the speech signal. When outside, the honking of a passing car can be included in the speech signal. As a result, the portion of a speech input which is to be interpreted as speech data can vary depending on what is to be interpreted as background “silence”—car honking, television programming, telephone ringing, interior cabin noise, or true silence.
The problem of speech signal quality in identifying speech data in a speech system can be compounded by the process of speech recognition. Speech recognition is the process of converting an acoustic signal, captured by transducer, for instance a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding. Speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal.
First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear. These phonetic variables are exemplified by the acoustic differences of the phoneme /t/ in two, true, and butter in American English. Second, differences in sociolinguistic background, dialect, and vocal tract size and shape can contribute to across-speaker variables. Third, acoustic variables can result from changes in the environment as well as in the position and characteristics of the transducer. Finally, speaker variables can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality.
The speech recognition accuracy of a speech-to-text conversion system depends directly upon the quality of an audio input signal containing the speech data to be converted to text. Specifically, it is desirable for the amplitude of an audio input signal to fall within an optimal range. While the specific limits of the desired range can vary from speech recognition engine to speech recognition engine, all speech recognition engines can experience imperfect speech recognition performance when the amplitude of an audio input signal falls outside of an acceptable range.
Specifically, an audio input signal having an amplitude falling below an extremely low level—an insufficient signal—can cause the degradation of speech recognition performance of a speech recognition engine. Correspondingly, an audio input signal having an amplitude exceeding an extremely high level can result in a saturated signal, a clipping condition as well as signal distortion. An insufficient or excessive audio signal can arise in response to a variety of conditions. For example, when providing speech input to a speech system, the speaker can move either the speaker's head with respect to the microphone or the microphone with respect to the speakers head. Also, the speaker inadvertently can change the volume of the speaker's voice or the input volume controlled by the audio circuitry used to receive the speech input audio signal.
When configuring a speech system, speech systems typically measure the characteristics of an audio input signal for a particular speaker using a particular microphone. Using these measured characteristics, the speech system can set system parameters to optimize the amplification and conditioning of the audio signal. Thus, in the case where different speakers provide audio input to the same speech system at different times, the speech system parameters can prove inadequate to accommodate the subsequent speaker for which the parameters had not been optimized. Likewise, in the case where different microphones are used at different times to provide speech audio input to the same speech system, the speech system parameters can prove inadequate to accommodate the second microphone for which the parameters had not been optimized. As a result, in either case, an insufficient or excessive audio signal condition can arise.
Present speech systems have yet to adequately address the problem of varying amplitudes of speech audio input signals. Specifically, what is needed is a method for monitoring the amplitude of a speech audio input signal during a speech session and adjusting the amplitude of the speech audio input signal accordingly. Hence, there exists a present need for dynamically adjusting audio input gain in a speech system.
SUMMARY OF THE INVENTION
A method for adjusting audio input signal gain in a speech system can include seven steps. First, an upper and a lower threshold can be predetermined in which the upper and lower threshold define an optimal range of audio data signal amplitude measurements. Second, a frame of unpredicted digital audio data samples can be received. In particular, the unpredicted digital audio data samples can be acquired by audio circuitry in a computer system. Significantly, the digital audio data samples received are not pre-scripted and are unknown to the computer system at the time of reception with regard to speech content.
Each sample can indicate an amplitude measurement of the audio data signal at a particular point in time. As such, third, a maximum signal amplitude can be calculated for a configurable measurement percentile of the unpredicted digital audio data samples. A measurement percentile is a selected percentage of samples in the digital audio data upon which computations are to be performed. For example, the calculation of the maximum signal amplitude for the ninety-eighth (98th) measurement percentile means the maximum signal amplitude for the first ninety-eight (98) percent of all samples in the frame.
Subsequent to the calculation of the maximum signal amplitude for the configured measurement percentile, fourth, the audio input signal gain can be incrementally adjusted downward if the maximum signal amplitude exceeds the upper threshold. Conversely, fifth, the audio input signal gain can be incrementally adjusted upward if the maximum signal amplitude falls below the lower threshold. Sixth, additional frames of unpredicted digital audio data samples can be received. Finally, seventh, each of the third through the sixth steps can be repeated with the received additional frames u

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for dynamic adjustment of audio input gain in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for dynamic adjustment of audio input gain in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for dynamic adjustment of audio input gain in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3173214

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.