Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-02-12
2002-04-30
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S226000
Reexamination Certificate
active
06381570
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to methods for conservation of bandwidth in a packet network. More specifically, the invention relates to methods for reducing the bandwidth consumption in voice-over packet networks by improved detection of active signals, background noise, and silence.
2. Description of the Background Art
A system for bandwidth savings, known as time assignment speech interpolation (TASI), was introduced to increase the capacity of submarine telephone cables used in analog telephony. TASI was subsequently replaced with a similar digital system. Such schemes are commonly known as digital speech interpolation (DSI) systems. As multimode and variable-rate speech coding techniques have improved, several promising silence compression standards have been developed and issued to address the bandwidth saving problem. The algorithm standardized by the GSM for use in the Pan-European digital Cellular Mobile Telephone Service is an example of a voice activity detection (VAD) technique designed for the mobile environment. Another VAD algorithm in wireless applications is provided with the ITA/EIA/IS-127 Enhanced Variable Rate Codec standard. There are two silence compression standards from ITU: G.723.1 Annex A, and G.729 Annex B.
Although these standards for bandwidth savings are very effective, their complexity is very high. The complexity of these methods derives from the fact that they rely upon processing the spectral features of a signal, which requires an analysis of the frequency and/or spectrum of the signal to identify the characteristics of speech, voice, or other distinct signals. These methods require adaptive algorithms to reduce noise, band pass filters to isolate speech, and the like to identify accurately characteristics of the signal to detect voice from other sounds, signals, or noise.
Complex standards require complex algorithms and therefore require significant processing capabilities. The method of the present invention significantly reduces complexity and therefore can be implemented in high channel density wired telephony applications. The present invention is simple in terms of processing and memory requirements and results in excellent performance.
SUMMARY OF THE INVENTION
In voice-over packet applications, speech signal is transmitted using data packets. The general telephone network will limit the bandwidth of the speech signal to 300 to 3,400 Hz range. In most speech codecs, the signal is sampled at 8 Khz resulting in the maximum signal bandwidth of 4 Khz. Each sample is represented with 16 bits, resulting in a 128 kbps bit rate. To save on bandwidth, PCM and ADPCM codecs are widely used in telephony applications and are important in high channel density implementation of voice-over packet applications. For the purpose of bandwidth savings with PCM and ADPCM codecs, voice activity detection is used to distinguish silence from active signal. The silence packets are not transmitted during any nonspeech interval, effectively increasing the number of channels. In voice-over packet applications, the input speech level can be varied from −50dBm0 to 0dBm0, facsimile signal level varies from −48dBm0 to 0dBm0, the noise properties may change considerably during a conversation.
To detect signal activity accurately under different signal input and noise conditions, the energy threshold is adapted to the input signal and noise levels. Because of its adaptive function, the corresponding signal activity detection algorithm herein provides bandwidth savings with low complexity and low delay and performs well for a wide range of signal energy input levels and background noise environments as well as signal energy level changes. Because the bandwidth savings may change based on packet network traffic load, the algorithm is dynamically configurable to adjust the bandwidth savings percentages.
In development of voice-over packet network applications, a reliable bandwidth saving method is crucial to achieve a desirable balance between acceptable perceived sound quality and reduction in bandwidth requirements. Due to a variety of working conditions a number of challenges are imposed upon such a method. The bandwidth savings needs to be accomplished with both low delay and low complexity. The method must perform well for a wide range of input signal levels, must work in a variety of background noise environments, and must be robust in the presence of active signal and/or background noise level changes. Since the bandwidth requirements may change based on network factors such as load or traffic conditions or because of changing performance needs, the present invention is dynamically configurable to perform well under different requirements. It is common for the noise environment to alter in real-time, and the present invention dynamically adjusts through monitoring such changes to accomplish bandwidth savings and to perform well under a wide variety of conditions.
The present invention accomplishes efficient savings in bandwidth through a system for active signal (e.g., voice, facsimile, dialtone) and background noise detection and discrimination which utilizes block energy threshold adaptation, adaptive marginal signal
oise discrimination, state control logic, and active signal smoothing. The system distinguishes active signal (e.g., voice, speech, etc.) from background noise to allow for the compression or elimination of periods of silence or background noise. The system includes a state machine for logic control in establishing a dynamic adaptive threshold, below which the signal is identified as silence or background noise, and above which the signal is identified as active signal. The threshold is established by factors, including an active signal estimation technique from discrimination of noise below a first threshold and active signal above a second threshold. Signal between the thresholds cannot be discriminated and is therefore not used in the estimation to avoid loss of voice through misidentification as noise or silence. The system is efficient in detection of active signals and elimination of noise, while maintaining a safety margin to avoid degradation of voice quality by misidentification of low voice signals as background or silence.
The state machine,
FIG. 2
, includes the flow logic,
FIG. 3
, for updating the adaptive block energy threshold used for threshold detection, FIG.
1
. There are three states in the state machine: learning state, converged state, and constant envelope state. Learning state is the initial and default state, where the system does not have any reliable estimates of noise or active signal energy levels. The state control logic
6
is in converged state when the current energy level threshold is acceptable and the noise and signal level estimations are reliable. When the input signal has an approximate constant envelope, the state machine is in the constant envelope state to distinguish facsimile from background noise in order to identify facsimile as active signal, not noise.
The system utilizes signal energy detection to establish and adjust the adaptive lower and upper thresholds. The signal is divided into blocks of a desired length, and signal features relating to the signal energy level are extracted for analysis to determine signal feature characteristics used to establish noise and active signal predictive thresholds. These established thresholds are used to discriminate the signal.
A signal from a source is first processed to determine the energy E
(n)
of the signal. The energy level is processed into energy vectors corresponding to discrete time intervals, for analysis. Each block is first processed by comparison with an initial set of thresholds within a marginal signal and noise discriminator, to discriminate initially between noise and signal. If below a first noise threshold, the block is classified as noise. If above a second voice threshold, the block is classified as active signal. Once discriminated, blocks below the noise threshold are used in noise
Kosanovic Bogdan
Li Dunling
Mladenovic Zoran
Azad Abul K.
Holland Robby T.
Telecky , Jr. Frederick J.
Telogy Networks Inc.
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Adaptive two-threshold method for discriminating noise from... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Adaptive two-threshold method for discriminating noise from..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Adaptive two-threshold method for discriminating noise from... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2822713