Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2001-10-19
2003-08-26
Chawan, Vijay (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S200100, C704S501000, C704S504000, C704S223000, C704S224000
Reexamination Certificate
active
06611798
ABSTRACT:
THE BACKGROUND OF THE INVENTION AND PRIOR ART
The present invention relates generally to encoding of an acoustic source signal such that a corresponding signal reconstructed on basis of the encoded information has a perceived sound quality, which is higher than according to known encoding solutions. More particularly the invention relates to encoding of acoustic signals to produce encoded information for transmission over a transmission medium according to the preambles of claims 1 and 31 respective decoding of encoded information having been transmitted over a transmission medium according to the preambles of claims 15 and 37. The invention also relates to communication system according to claim 44, computer programs according to claims 13 and 29 respectively and computer readable media according to claims 13 and 30 respectively.
There are many different applications for speech codecs (codec=
co
der and
dec
oder). Encoding and decoding schemes are used for bit-rate efficient transmission of acoustic signals in fixed and mobile communications systems and in videoconferencing systems. Speech codecs can also be utilised in secure telephony and for voice storage.
The trend in fixed and mobile telephony and in videoconferencing is towards improved quality of the reconstructed acoustic signal. This trend reflects the customer expectation that these systems provide a sound quality equal to or better than that of today's fixed telephone network. One way to meet this expectation is to broaden the frequency band for the acoustic signal and thus convey more of the information contained in the source signal to the receiver. It is true that the majority of the energy of a speech signal is spectrally located between 0 kHz and 4 kHz (i.e. the typical bandwidth of a state-of-the-art codec). However, a substantial amount of the energy is also distributed in the frequency band 4 kHz to 8 kHz. The frequency components in this band represent information that is perceived by a human listener as “clearness” and a feeling of the speaker “being close” to the listener.
The frequency resolution of the human hearing decreases with increasing frequencies. The frequency components between 4 kHz and 8 kHz therefore require comparatively few bits to model with a sufficient accuracy. Today there are, nevertheless, no known bit-rate efficient broadband codecs, which provide a reconstructed acoustic signal with a satisfying perceived quality. The existing ITU-T G.722 wideband coding standard, which operates at bit-rates of 48, 56 and 64 kbps merely offers unsatisfying quality, when comparing with the employed bit-rates (ITU-T=International Telecommunication Union, standardisation sector).
The U.S. Pat. No. 5,956,686 describes an adaptive transform coding/decoding arrangement in which the spectrum of an envelope is divided into frequency bands, so that different coding methods can be applied to the envelopes of the individual bands. This makes it possible to exploit different redundancies between the bands of the spectrum envelope. The spectrum envelope is also adjusted to the coding and/or transmission method to compensate for the time fluctuation in each frequency band.
The U.S. Pat. No. 5,526,464 describes a code excited linear prediction coding method where the residual signal is divided into frequency bands. A particular codebook is provided for each band and the size of the codebook decreases with increasing frequency band. The sampling rate is reduced with decreasing frequency in order to reduce the codebook search complexity.
Hence, there exist examples in the art where the applied coding schemes take into consideration the varying properties of different frequency bands. However, the different properties have only been utilised to obtain a bit-efficient coding of the source signal. There are yet no teachings of any special measures taken to compensate for inherent deficiencies in the applied coding when using a coding scheme optimised for a first frequency band for coding signals in a second frequency band.
Today, most speech coding models are designed for narrowband signals (typically 0-4 kHz). If such speech coding models are applied for coding of an acoustic signal having a larger bandwidth, say 0-8 kHz, the coding will only be optimised for a part of the relevant frequency band, namely the lower part.
One reason for this is that the quantisation of coding parameters generally involves correlation in the time domain between a target signal and a reproduced signal. Such correlation will primarily be based on signal matching in the low-frequency region since the higher frequency components of a speech signal have a low power density in comparison to the low frequency components. As a result of this, the high frequency components will be poorly reproduced at the receiver side.
Unfortunately, this poor reproduction cannot be excused either by flaws in the human hearing or by the characteristics of voice signals. When voice sounds are generated, the vocal tract operates as a filter on airwaves originating the lungs. The so-called formants correspond to the resonance frequencies of this filter. In the lower frequency band of a voice, signal the target signal has distinct formants. However, for higher frequencies the formants are more diffuse. Due to the limitations of the speech model used an acoustic signal having a relatively large bandwidth being encoded by means of a conventional narrowband coder will be reproduced as a signal having distinct spectral structure (i.e. peaks and valleys) also in its upper frequency band. A human listener generally perceives an acoustic signal with such characteristics as unnatural and having a metallic like sound.
Occasionally, a secondary coder is applied either to the output signal of the first coder or in parallel with the first coder in order to further increase the quality of the reconstructed signal. If this measure is taken for a conventional narrowband coder when used for encoding a broadband source signal the spectral structure in the high end of the frequency band will occasionally be even more pronounced. While this is desirable for narrowband acoustic signals in terms of improved sound quality, for wideband acoustic signals, however, the effect may be contrary.
SUMMARY OF THE INVENTION
The object of the present invention is therefore to provide an improved coding scheme for acoustic signals, which alleviates the problems above.
According to one aspect of the invention the object is achieved by a method of encoding an acoustic source signal to produce encoded information for transmission over a transmission medium as initially described, which is characterised by the primary coded signal and the target signal each comprising coefficients of which each coefficient represents a frequency component. At least one smoothed signal corresponding to the primary coded signal respective the target signal is produced that is a selectively modified version of the primary coded signal respective the target signal wherein a variation is reduced in the coefficient values representing frequency information above a threshold value.
According to a further aspect of the invention the object is achieved by a computer program directly loadable into the internal memory of a computer, comprising software for controlling the method described in the above paragraph when said program is run on a computer.
According to another aspect of the invention the object is achieved by a computer readable medium, having a program recorded thereon, where the program is to make a computer control the method described in the penultimate paragraph above.
According to still another aspect of the invention the object is achieved by a method of decoding an estimate of an acoustic source signal as initially described, which is characterised by a smoothed primary decoded spectrum comprising coefficients of which each represents a frequency component. The smoothed primary decoded spectrum is a selectively modified version of one of the at least one primary decoded spectrum wherein a
Bruhn Stefan
Olvenstam Susanne
Chawan Vijay
Telefonaktiebolaget LM Ericsson (publ)
LandOfFree
Perceptually improved encoding of acoustic signals does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Perceptually improved encoding of acoustic signals, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Perceptually improved encoding of acoustic signals will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3089379