Audio coding systems and methods

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S219000

Reexamination Certificate

active

06675144

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to audio coding systems and methods and in particular, but not exclusively, to such systems and methods for coding audio signals at low bit rates.
BACKGROUND OF THE INVENTION
In a wide range of applications it is desirable to provide a facility for the efficient storage of audio signals at a low bit rate so that they do not occupy large amounts of memory, for example in computers, portable dictation equipment, personal computer appliances, etc. Equally, where an audio signal is to be transmitted, for example to allow video conferencing, audio streaming, or is telephone communication via the Internet, etc., a low bit rate is highly desirable. In both cases, however, high intelligibility and quality are important and this invention is concerned with a solution to the problem of providing coding at very low bit rates whilst preserving a high level of intelligibility and quality, and also of providing a coding system which operates well at low bit rates with both speech and music.
In order to achieve a very low bit rate with speech signals it is generally recognised that a parametric coder or “vocoder” should be used rather than a waveform coder. A vocoder encodes only parameters of the waveform, and not the waveform itself, and produces a signal that sounds like speech but with a potentially very different waveform.
A typical example is the LPC10 vocoder (Federal Standard 1015) as described in T. E. Tremaine “The Government Standard Linear Predictive Coding Algorithm: LPC10; Speech Technology, pp 40-49, 1982) superseded by a similar algorithm LPClOe, the contents of both of which are incorporated herein by reference. LPC10 and other vocoders have historically operated in the telephony bandwidth (0-4 kHz) as this bandwidth is thought to contain all the information necessary to make speech intelligible. However we have found that the quality and intelligibility of speech coded at bit rates as low as 2.4 Kbit/s in this way is not adequate for many current commercial applications.
The problem is that to improve the quality, more parameters are needed in the speech model, but encoding these extra parameters means fewer bits are available for the existing parameters. Various enhancements to the LPC10e model have been proposed for example in A. V. McCree and T. P. Barnwell III “A Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding”; IEEE-Trans Speech and Audio Processing Vol.3 No.4 July 1995, but even with all these the quality is barely adequate.
In an attempt to further enhance the model we looked at encoding a wider bandwidth (0-8 kHz). This has never been considered for vocoders because the extra bits needed to encode the upper band would appear to vastly outweigh any benefit in encoding it. Wideband encoding is normally only considered for good quality coders, where it is used to add greater naturalness to the speech rather than to increase intelligibility, and requires a lot of extra bits.
One common way of implementing a wideband system is to split the signal into lower and upper sub-bands, to allow the upper sub-band to be encoded with fewer bits. The two bands are decoded separately and then added together as described in the ITU Standard G722 (X. Maitre, “7 kHz audio coding within 64 kbit/s”, IEEE Journal on Selected Areas in Comm., vol.6, No.2, pp283-298, Feb 1988). Applying this approach to a vocoder suggested that the upper band should be analysed with a lower order LPC than the lower band (we found second order adequate). We found it needed a separate energy value, but no pitch and voicing decision, as the ones from the lower band can be used. Unfortunately the recombination of the two synthesized bands produced artifacts which we deduced were caused by phase mismatch between the two bands. We overcame this problem in the decoder by combining the LPC and energy parameters of each band to produce a single, high-order wideband filter, and driving this with a wideband excitation signal.
Surprisingly, the intelligibility of the wideband LPC vocoder for clean speech was significantly higher compared to the telephone bandwidth version at the same bit rate, producing a DRT score (as described in W. D. Voiers, ‘Diagnostic evaluation of speech intelligibility’, in Speech Intelligibility and Speaker Recognition (M. E. Hawley, cd.) pp. 374-387, Dowden, Hutchinson & Ross, Inc., 1977) of 86.8 as opposed to 84.4 for the narrowband coder.
However, for speech with even a small amount of background noise, the synthesised signal sounded buzzy and contained artifacts in the upper band. Our analysis showed that this was because the encoded upper band energy was being boosted by the background noise, which during the synthesis of voiced speech boosted the upper-band harmonics, creating a buzzy effect.
On further detailed investigation we found that the increase in intelligibility was mainly a result of better encoding of the unvoiced fricatives and plosives, not the voiced sections. This led us to a different approach in the decoding of the upper band, where we synthesized only noise, restricting the harmonics of the voiced speech to the lower band only. This removed the buzz, but could instead add hiss if the encoded upper band energy was high, because of upper band harmonics in the input signal. This could be overcome by using the voicing decision, but we found the most reliable way was to divide the upper band input signal into noise and harmonic (periodic) components, and encode only the energy of the noise component.
This approach has two unexpected benefits, which greatly enhance the power of the technique. Firstly, as the upper band contains only noise there are no longer problems matching the phase of the upper and lower bands, which means that they can be synthesized completely separately even for a vocoder. In fact the coder for the lower band can be totally separate, and even be an off-the-shelf component. Secondly, the upper band encoding is no longer speech specific, as any signal can be broken down into noise and harmonic components, and can benefit from reproduction of the noise component where otherwise that frequency band would not be reproduced at all. This is particularly true for rock music, which has a strong percussive element to it.
The system is a fundamentally different approach to other wideband extension techniques, which are based on waveform encoding as in McElroy et al: Wideband Speech Coding in 7.2 KB/s ICASSP 93 pp 11-620-II-623. The problem of waveform encoding is that it either requires a large number of bits as in G722 (Supra), or else poorly reproduces the upper band signal (McElroy et al), adding a lot of quantisation noise to the harmonic components.
In this specification, the term “vocoder” is used broadly to define a speech coder which codes selected model parameters and in which there is no explicit coding of the residual waveform, and the term includes coders such as multi-band excitation coders (MBE) in which the coding is done by splitting the speech spectrum into a number of bands and extracting a basic set of parameters for each band.
The term vocoder analysis is used to describe a process which determines vocoder coefficients including at least LPC coefficients and an energy value. In addition, for a lower sub-band the vocoder coefficients may also include a voicing decision and for voiced speech a pitch value.
SUMMARY OF THE INVENTION
According to one aspect of this invention there is provided an audio coding system for encoding and decoding an audio signal, said system including an encoder and a decoder, said encoder comprising:
means for decomposing said audio signal into an upper and a lower sub-band signal;
lower sub-band coding means for encoding said lower sub-band signal;
upper sub-band coding means for encoding at least the non-periodic component of said upper sub-band signal according to a source-filter model;
said decoder means comprising means for decoding said encoded lower sub-band signal and said encoded upper sub-band signal, and for reconstructing therefrom an audio outp

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Audio coding systems and methods does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Audio coding systems and methods, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Audio coding systems and methods will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3190023

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.