Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Utility Patent
1997-12-03
2001-01-02
Hudspeth, David R. (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S226000, C704S233000
Utility Patent
active
06169971
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to methods for processing speech and, more particularly, to methods for suppressing background noise in digital voice signals.
BACKGROUND OF THE INVENTION
Voice processing technologies often include the use of a conventional automatic gain control (AGC). Input signals representative of voice information are applied to the AGC. Typically, the input signals will reflect varying speech patterns. For example, an input signal can include voice information associated with relatively loud as well as relatively soft speech. The AGC selectively amplifies the input signal. Generally, the AGC provides a relatively low gain for portions of the input signal that have high energy levels. The AGC provides a relatively high gain for portions of the input signal that have low energy levels. A primary purpose of the AGC is to control the amplification of the input signal so that soft speech is sufficiently amplified for a particular voice processing application and loud speech is attenuated to avoid overloading the processing circuitry.
The amplification provided by the AGC depends on many factors, including the nature of the input signal as well as a decay time constant provided for the AGC. An input signal will typically have both noise signal components along with voice signal components. Usually, noise components are identified by their relatively low energy levels, while voice components are identified by their relatively high energy levels. Because noise components have low energy levels, the AGC could undesirably amplify the noise components, unless preventive measures are provided.
To prevent the amplification of background noise, a decay time constant is associated with the operation of the AGC. The decay time constant defines how quickly the AGC will adjust its gain value when it detects a decrease in the energy level of the input signal. The AGC delays increasing its gain upon the detection of a decrease in the input signal's energy level according to the decay time constant. An illustration better describes the function of decay time constants in voice processing applications employing AGCs.
FIG. 1
is a graphical depiction of a voice signal having both voice components and background noise components. The x axis of the graph represents time in seconds. The y axis represents the amplitude of the signal without units. The voice components of the signal are characterized by high amplitude portions of the signal. The signal over interval A is an example of a voice component. The noise components of the signal are characterized by low amplitude portions of the signal. The signal over interval B is an example of a noise component. The signal is provided to a conventional AGC.
As stated above, the AGC variably amplifies an input signal depending on the amplitude of the input signal. To avoid the amplification of noise components, the decay time constant should be larger than the maximum time distance between two subsequent high peak regions of the signal. For example, if the decay time constant has a value equal to the time distance between the peaks of the signal over interval A and interval C, the noise component over interval B will not be amplified. The AGC will appropriately amplify the signal over interval A with a relatively low gain value. As the signal transitions from the voice component over interval A to the noise component over interval B, the decay time constant causes the AGC to maintain the same gain value as when it received the voice component over interval A. Because the gain value is maintained, the noise component over interval B is also amplified with a relatively low gain value. In this way, the noise component over interval B is minimized.
If the decay time constant has too small a value, the noise component over interval B would be undesirably amplified. As stated above, the AGC would provide a relatively small gain value for the voice component over interval A. The AGC would then detect the transition from the relatively high energy levels of the voice component to the relatively low energy levels of the noise component over interval B. If the decay time constant is set, for example, to a value less than the time distance between the two peaks of the signal over interval A and interval C, the AGC would provide a relatively high gain value for the noise component over interval B. The relatively large amplification of the noise component over interval B is the undesirable result of selecting a decay time constant that is too small.
Although the time decay constant should not have too small a value, many disadvantages are posed when the time decay constant is too large. If too large, the decay time constant will prevent the AGC from detecting voice components having varying energy levels. Voice components having varying energy levels represent soft and loud speech. If the signal includes a voice component having a relatively low energy level, and the decay time constant is set to a relatively large value, the AGC would not provide a relatively large gain value to the voice component, as would be optimal. Rather, the AGC would provide to the voice component the same small gain value associated with the voice component having a relatively high energy level. Accordingly, the voice component having a relatively low energy level would not be sufficiently amplified.
For example, assume that the signal includes voice components over intervals D and E, as shown in FIG.
1
. The energy level of the signal over interval E is less than the energy level of the signal over interval D. Ideally, the AGC would amplify the signal over interval E more than the signal over interval D. If the decay time constant is chosen to be larger than the time distance between the peaks of the signal over the intervals D and E, the signal over interval E would not be appropriately amplified. Instead, the decay time constant would cause the AGC to apply the same gain value for the signal over interval E as the signal over interval D. As a result, the AGC would fail to provide sufficient amplification for the signal over interval E.
Prior art techniques employing conventional AGCs have attempted to determine optimal values for the time decay constant to avoid the aforementioned problems. However, the determination of the time decay constant involves estimating the time distance between two peaks of successive voice components. Diversity in speech patterns has further complicated the estimation of this time distance and thus the optimal values for time decay constants. Too often, the estimate of the time decay constant is unacceptably imprecise, increasing the presence of noise and attendantly decreasing voice quality.
Because the estimation of time decay constants in AGCs fails to reliably provide noise reduction and voice amplification, techniques to better distinguish noise components from voice components have been proposed. Some of these techniques are commonly referred to as voice activity detection (VAD). One such VAD technique is the zero crossing rate technique. Under the zero crossing rate technique, a voice signal is analyzed to determine what portions thereof cross a zero amplitude line. The zero line separates positive amplitude values of a signal from negative values of the signal. The number of times the signal crosses the zero line in a given time is referred to as the zero crossing rate. Voice components have relatively low zero crossing rates, while noise components have relatively high zero crossing rates. Accordingly, noise components and voice components can often be identified based on their zero crossing rates.
Other popular VAD techniques are used to distinguish voice from noise. One such technique is commonly referred to as the linear prediction technique. Under this technique, linear prediction coefficients (LPC coefficients) are calculated to indicate the presence of voice or noise, depending on the value of the LPC coefficients. Another VAD technique is to determine how quickly or slowly the energy level of the
Azad Abul K.
Christensen O'Connor Johnson & Kindness PLLC
Glenayre Electronics, Inc.
Hudspeth David R.
LandOfFree
Method to suppress noise in digital voice processing does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method to suppress noise in digital voice processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method to suppress noise in digital voice processing will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2506259