Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1998-06-30
2001-06-19
Korzuch, William R. (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S225000, C704S214000
Reexamination Certificate
active
06249758
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to the field of processing audio signals, such as speech signals that are compressed or encoded with a digital signal processing technique. More specifically, the invention relates to an improved method and an apparatus for coding speech signals that can be particularly useful in the field of wireless communications.
BACKGROUND OF THE INVENTION
In communication applications where channel bandwidth is at a premium, it is essential to use the smallest possible portion of a transmission channel in order to transmit a voice signal. A common solution is to process the voice signal with an apparatus called a speech codec before it is transmitted on a RF channel.
Speech codecs, including an encoding and a decoding stage, are used to compress (and decompress) the digital signals at the source and reception point, respectively, in order to optimize the use of transmission channels. By encoding only the necessary characteristics of a speech signal, fewer bits need to be transmitted than what is required to reproduce the original waveform in a manner that will not significantly degrade the speech quality. With fewer bits required, lower bit rate transmission can be achieved
Most state-of-the-art codecs are based on the original CELP odel proposed by Schroeder and Atal in “Code-Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates,” Proceedings of ICASSP, pp. 937-940, 1985. This document is hereby incorporated by reference. This basic codec model has been improved in many aspects to achieve bit rates of approximately 8 kbits/sec and even lower, but voice quality in those with lower bit rates may not be acceptable for telephony applications. An example of an 8 kbits/sec codec is fully described in version 5.0 of the International Telecommunication Union Telecommunications Standardization Sector (ITU-TSS) Draft recommendation G.729 “Coding of speech at 8 kbits/s using Conjugate-Structure Algebraic-Code-Excited Linear-Predictive (CS-ACELP) coding”, dated Jun. 8, 1995. This document is hereby incorporated by reference.
Considering that lower bit rates at acceptable speech quality provide great economical advantages, there exists a need in the industry to provide, an improved speech coding apparatus and method particularly well suited for telecommunications applications
OBJECTIVES AND SUMMARY OF THE INVENTION
A general object of the invention is to provide an improved audio signal coding device, such as a Linear Predictive (LP) encoder, that achieves audio coding at low bit rates while maintaining audio quality at a level acceptable for communication applications.
A more specific object of the invention is to provide an audio signal coding device and a method for coding audio signals while taking into consideration the voiced or unvoiced nature of the audio signal.
Another specific object of the invention is to provide an audio signal coding device and a method for coding an audio signal capable of better predicting the pitch characteristics of the audio signal.
Another specific object of the invention is to provide an audio signal coding method for smoothing the parameters for voiced and unvoiced subframes before their transmission.
In this specification, the term “filter coefficients” is intended to refer to any set of coefficients that uniquely defines a filter function that models the spectral characteristics of an audio signal. In conventional audio signal encoders, several different types of coefficients are known, including linear prediction coefficients, reflection coefficients, arcsines of the reflection coefficients, line spectrum pairs, log area ratios, among others. These different types of coefficients are usually related by mathematical transformations and have different properties that suit them to different applications. Thus, the term “filter coefficients” is intended to encompass any of these types of coefficients.
In this specification, the term “excitation segment” is defined as information that needs to be combined with the filter coefficients in order to provide a complete representation of the audio signal. Such excitation segment may include parametric information describing the periodicity of the speech signal, a residual (often referred to as “excitation signal”) as computed by the encoder of a vocoder, speech framing control information to ensure synchronous framing in the decoder associated with the remote vocoder, pitch periods, pitch lags, gains and relative gains, among others.
In this specification, the term “sample” refers to the amplitude value at one specific instant in time of a signal. PCM (Pulse Code Modulation) is a form of coding of an analog signal that produces plurality of samples, each sample representing the amplitude of the waveform at a certain time.
The term “audio signal subframe” refers to a set of samples that represent a portion of an audio signal such as speech. For example, in an embodiment of this invention, subframes of 40 samples were used. Also, “audio signal frames” are defined as a plurality of samples sets, each set being representative of a sub-frame. In a specific example, an audio signal frame has four sub-frames
In a most preferred embodiment, the audio signal-encoding device encodes an audio signal, such as a speech signal differently in dependence upon the voiced/unvoiced characteristics of the signal. In a most preferred embodiment, the audio signal encoding device includes two signal synthesis stages, one better suited for unvoiced signals and one better suited for voiced signals. In operation, each signal synthesis stage generates a synthesized speech signal based on a set of parameters, such as filter coefficients and excitation segment computed to best approximate the input speech signal sub-frame. The two synthesized signals are compared and the one that manifests less error with respect to the input speech signal is selected as being the best match and the parameters previously computed for this synthesized signal are the ones used to form the compressed or encoded audio signal sub-frame.
The major difference between the signals produced by the voiced signal synthesis stage and the unvoiced signal synthesis stage reside in the periodicity or pitch of the signals. The synthesized voiced signal manifests a higher periodicity than the synthesized unvoiced signal.
In a specific example, the voiced signal synthesis stage comprises an adaptive codebook containing prior knowledge entries that are past audio signal sub-frames. The output of this codebook provides the periodic component of the signal generated by the voiced signal synthesis stage. Selecting an entry from a pulse stochastic codebook and passing this entry into a synthesis filter produces the aperiodic component.
The unvoiced signal synthesis stage comprises a noise stochastic codebook that issues a sample noise signal used as input to a synthesis filter. The output of the synthesis filter is the synthetic unvoiced audio signal.
As embodied and broadly described herein, the invention provides an audio signal encoding device comprising:
an input for receiving a sub-frame of an audio signal;
a voiced audio signal synthesis stage coupled to said input capable of producing a first synthetic audio signal approximating the sub-frame of an audio signal received at said input on a basis of a first set of parameters;
an unvoiced audio signal synthesis stage coupled to said input capable of producing a second synthetic audio signal approximating the subframe of an audio signal received at said input on a basis of a second set of parameters;
processing means coupled to said signal synthesis stages for outputting a set of parameters allowing generation of a selected one of the first synthetic audio signal and the second synthetic audio signal.
a)
As embodied and broadly described herein, the invention thus provides a method for encoding an audio signal comprising the steps of:
receiving a sub-frame of an audio signal;
producing a voiced synthetic audio signal approximating the sub-frame of an audio signal on
Korzuch William R.
Nortel Networks Limited
Wieland Susan
LandOfFree
Apparatus and method for coding speech signals by making use... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for coding speech signals by making use..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for coding speech signals by making use... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2501665