SID frame detection with human auditory perception compensation

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06807525

ABSTRACT:

CROSS-REFERENCE TO RELATED APPLICATIONS
Not applicable.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
Not applicable.
BACKGROUND OF THE INVENTION
This invention relates to bandwidth improvements in digitized voice applications when no voice is present. In particular, the invention suggests that improved estimation of background noise during interruptions in speech leads to less bandwidth consumption.
Voice over packet networks (VOPN), require that the voice or audio signal be packetized and then be transmitted. The analog voice signal is first converted to a digital signal and is compressed in the form of a pulse code modulated (PCM) digital stream. As illustrated in
FIG. 1
, the PCM stream is processed by modules of the gateway, such as echo cancellation (EC)
10
, voice activity detection (VAD)
12
, voice compression (CODEC)
14
, protocol configuration
16
, etc.
Various techniques have been developed to reduce the amount of bandwidth used in the transmission of voice packets. One of these techniques reduces the number of transmitted packets by suspending transmission during periods of silence or when only noise is present. Two algorithms, i.e., the VAD algorithm followed by the Discontinuous Transmission (DTX) algorithm, achieve this process. In a system where these two algorithms exist and are enabled, VAD
12
makes the “voice
o voice” selection as illustrated in FIG.
1
. Either one of these two choices is the VAD algorithm's output. If voice (active) is detected, a regular voice path is followed in the CODEC
14
and the voice information is compressed into a set of parameters. If no voice (inactive) is detected, the DTX algorithm is invoked and a Silence Insertion Descriptor (SID) packet
18
is transmitted at the beginning of this interval of silence. Aside from the first transmitted SID
18
, during this inactive period, DTX analyzes the background noise changes. In case of a spectral change, the encoder sends a SID packet
18
. If no change is detected, the encoder sends nothing. Generally, SID packets contain a signature of the background noise information
20
with a minimal number of bits in order to utilize limited network resources. On the receiving side, for each frame, the decoder reconstructs a voice or a noise signal depending on the received information. If the received information contains voice parameters, the decoder reconstructs a voice signal. If the decoder receives no information, it generates noise with noise parameters embedded in the previously received SID packet. This process is called Comfort Noise Generation (CNG). If the decoder is muted during the silent period, there will be sudden drops of the signal energy level, which causes unpleasant conversation. Therefore, CNG is essential to mimic the background noise on the transmitting side. If the decoder receives a new SID packet, it updates its noise parameters for the current and future CNG until the next SID is received.
In ITU standard G.729 Annex B, the DTX and CNG algorithms are designed to operate under a variety of levels and characteristics of speech and noise, ensuring bit rate savings and no degradation in the perceived quality of sound. Though the G.729 Annex B SID frame detection algorithm yields smooth background noise during non-active periods, it detects a significant percentage of SID frames even when the background noise is almost stationary. In a real VOPN system, G.729 Annex B generates numerous SID packets continuously, even when the background noise level is very low in dB. One reason for this is that the SID detection algorithm is too sensitive to very low level background noise. Another reason is the effects of imperfect EC. The output signal of EC may have bursts or non-stationary characteristics in low level noise, even when its input noise is stationary.
Since SID frames have considerably fewer payload bits than voice packets, generating many SID packets should theoretically not create bandwidth problems. However, both voice and SID packets
22
must have packet headers
24
in VOPN applications (FIG.
2
.). The header length is the same for voice and SID packets. Sometimes the header
24
occupies most of the bandwidth in a SID packet
22
. For instance, in RTP protocol, the header length is 12 bytes. One SID frame contains 2 bytes and a voice frame requires 10 bytes in a G.729 codec. Although SID frame bit rate is 20% of the full bit rate in G.729 codec, when the headers
24
are appended to the packet, the SID packet length with RTP header is about 70% of voice packet length with header. Therefore, it is very important for bandwidth savings to reduce the number of SID packets while preserving sound quality.
SUMMARY OF THE INVENTION
The SID detection algorithm of G.729 Annex B is based on spectral and energy changes of background noise characteristics after the last transmitted SID frame. The Itakura distance on the linear prediction filters is used to represent the spectral changes. When this measure exceeds a fixed threshold, it indicates a significant change of the spectrum. The energy change is defined as the difference between the quantized energy levels of the residual signal in the current inactive frame and in the last SID frame. The energy difference is significant if it is exceeds 2 dB. Since the thresholds of SID detection are fixed and on a crude basis, the generation of an excess number of SID frames is anticipated. Therefore, a SID update delay scheme is used to save bandwidth during nonstationary noise; a minimum spacing of two frames is imposed between the transmission of two consecutive SID frames. This method artificially limits the generation of SID frames.
The present invention creates a method to determine if a background noise update is warranted, and is based upon human auditory perception (HAP) factors, instead of an artificial limiter on the excessive SID packets. The acoustic factors, which characterize the unique aspects of HAP, have been known and studied. The applicability of perception, or psycho acoustic modeling, to complex compression algorithms is discussed in IEEE transactions on signal processing, volume 46, No. 4, April 1998; and in the AES papers of Frank Baumgarte, which relate to the applicability of HAP to digitizing audio signals for compressed encoded transmission. Other papers recognize the applicability of HAP to masking techniques for applicability to encoding of audio signals.
While some of these works acknowledge the applicability of HAP when compressing high fidelity acoustic files for efficient encoding, they do not recognize the use of HAP in SID detection, (i.e. background noise perceptual change identification, in voice communications). The present invention observes that modeling transitions, based upon HAP, can reduce the encoding of changes in background noise estimation, by eliminating the need to encode changes imperceptible to the HAP system. The present invention does not analyze speech for improved audio compression, but instead searches for characteristics in the perceptual changes of background noise.
HAP is often modeled as a nonlinear preprocessing system. It simulates the mechanical and electrical events in the inner ear, and explains not only the level of dependent frequency selectivity, but also the effects of suppression and simultaneous masking. Many factors can affect the perception of sound, including: frequency masking, temporal masking, loudness perception based on tone, and auditory perception differential based upon tone. The factors of HAP can cause masking, which occurs when a factor apart from the background noise renders any change in the background noise imperceptible to the human ear. In a situation where masking occurs, it is not necessary to update background noise, because the changes are not perceptible. The present invention accounts for these factors, by identifying and weighing each factor to determine the appropriate level of SID packet generation, thus increasing SID detection efficiency.
The most responsive frequency for human perception, as illust

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

SID frame detection with human auditory perception compensation does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with SID frame detection with human auditory perception compensation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and SID frame detection with human auditory perception compensation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3299551

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.