Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1998-11-30
2001-07-03
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S210000, C704S201000, C704S226000
Reexamination Certificate
active
06256606
ABSTRACT:
BACKGROUND
1. Technical Field
The present invention relates generally to speech coding using a speech codec; and, more particularly, it relates to silence description coding for multi-rate speech codecs.
2. Description of Prior Art
Conventional speech codec systems that employ silence description coding typically employ some type of voice activity detection algorithm that determines the existence of a substantially speech-like signal contained within a speech signal. When no voice activity is detected in the speech signal, the conventional speech codec utilizes a reduced data transmission rate. In addition, in conventional speech codecs that employ discontinued transmission, operation at a full data transmission rate is performed only when there is an existence of the substantially speech-like signal contained within the speech signal.
A common approach to performing data transmission at the reduced rate, particularly within conventional speech codec systems that operate at multiple data transmission rates, is to employ a fixed reduced rate for each of a multiple data transmission rates. For example, a first reduced data transmission rate accompanies the highest of the multiple data transmission rates. second reduced data transmission rate accompanies the lowest of the multiple data transmission rates. This convention solution of dedicating a separate reduced data transmission rate for each of the multiple data transmission rates results in gross over-allocation of encoder processing resources in the conventional speech codec, in that, more processing circuitry is required to accommodate each of the reduced data transmission rates. Additionally, it creates a computational complexity associated with the need to have a dedicated reduced data transmission rate for each of the multiple data transmission rates.
Another limitation associated with the conventional solution of having a separate reduced data transmission rate for each of the multiple data transmission rates is the intrinsic limitation of bandwidth available within any communication system. Inefficient allocation and management of the available bandwidth in the communication system provides undesirable limitations on the number of communication devices that may be employed at any given time. Additionally, the inefficient use of the available bandwidth precludes efficient use of the remaining bandwidth for other functions not associated exclusively with data transmission. In many conventional speech codec systems, the entire bandwidth spectrum is consumed, and there simply is no available remaining bandwidth in which to perform the other functions.
The traditional solution of detecting the existence of the substantially speech-like signal contained within a speech signal and adjusting the data transmission rate as a function of the substantially speech-like signal typically performs encoding and transmission of all speech segments. The encoding and transmission of all speech segments includes those speech segments that do not contain the substantially speech-like signal. This results in very inefficient allocation of the speech codec's processing resources, in that, every speech segment is encoded even in the absence of the substantially speech-like signal. Operation at the reduced data transmission rate typically involves transmitting a subset of parameters that the speech codec uses to encode the speech signal. The subset of parameters is typically transmitted only when there is a perceptual change in the substantially non-speech-like speech signal.
Other conventional speech codec systems discontinue data transmission altogether in the absence of the substantially speech-like signal. In these conventional speech codec systems, a voice activity detection algorithm is implemented that determines the existence of the substantially speech-like signal and simply discontinues data transmission when it is absent. Such systems suffer from the undesirable perceptual effect of apparent disconnection of the communication link, in that, the silence associated with no data transmission at all gives the listener the impression that no one is on the other end. This undesirable impression of disconnection of the communication link generated from interrupted data transmission greatly reduces the perceptual performance of such conventional speech codec systems. The conventional solution to generate the impression that another individual is on the other end involves performing comfort noise generation. Comfort noise generation is a specific mode of discontinued transmission wherein only a small number of speech parameters are transmitted from an encoder to a decoder in a speech codec, and intermediary values between the small number of speech parameters are generated via interpolation. The entirety of the speech parameters (including the interpolated values) are used to produce a reproduced non-speech signal that is perceptually indistinguishable from background noise. This solution of comfort noise generation provides the perceptual effect of background noise.
Further limitations and disadvantages of conventional and traditional systems will become apparent to one of skill in the art after reviewing the remainder of the present application with reference to the drawings.
SUMMARY OF THE INVENTION
Various aspects of the present invention can be found in a multi-rate speech codec that performs discontinued transmission. Specifically within the discontinued transmission, silence description coding of a speech signal is performed using a single silence description coding scheme independent of past, present, and future coding schemes that are employed to various portions of the speech signal. The speech signal has varying characteristics, and at least one of the varying characteristics is sometimes a substantially speech-like characteristic. The identification of the substantially speech-like characteristic is performed using voice detection circuitry. When there is an absence of the substantially speech-like characteristic in the speech signal, processing circuitry applies a predetermined coding mode to the speech signal independent of past, present, and future coding schemes. The predetermined coding mode is selected from among a plurality of coding modes.
In certain embodiments of the invention, the discontinued transmission involves voice activity detection, silence description coding, and comfort noise generation. The voice activity detection is performed in an encoder of the multi-rate speech codec that determines the existence of a substantially speech-like characteristic in the speech signal. The voice activity detection also detects a change in the perceptual characteristic of the speech signal. The silence description coding is also performed in the encoder wherein a small number of parameters used to code the speech signal are then transmitted to the decoder. The decoder performs the comfort noise generation to generate a non-speech-like signal that is perceptually indistinguishable from the speech signal. The silence description coding is performed to speech signals not having a substantially speech-like characteristic independent of past, present, and future coding schemes. certain embodiments of the invention, the predetermined coding mode fits within a predetermined bit rate budget. The predetermined bit rate budget is determined from the particular bit rate at which the multi-rate speech codec is operating. In other embodiments of the invention, the predetermined coding mode is a source coding mode that operates at a bit rate that is the lowest bit rate of all the source coding modes contained within the plurality of coding modes. Signaling coding and channel coding are also performed by the multi-rate speech codec in coding the speech signal. The multi-rate speech codec performs error checking within an unused portion of a bandwidth of the multi-rate speech codec's bit rate. This error checking involves majority voting in certain embodiments of the invention.
Other aspects, advantages and novel features of the pres
Benyassine Adil
Shlomot Eyal
Su Huan-Yu
Thyssen Jes
Brinks Hofer Gilson & Lione
Conexant Systems Inc.
Dorvil Richemond
Nolan Daniel A.
LandOfFree
Silence description coding for multi-rate speech codecs does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Silence description coding for multi-rate speech codecs, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Silence description coding for multi-rate speech codecs will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2470665