Low-complexity, low-delay, scalable and embedded speech and...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S219000, C704S230000

Reexamination Certificate

active

06351730

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to audio signal processing and is directed more particularly to a system and method for scalable and embedded coding and transmission of speech and audio signals.
BACKGROUND OF THE INVENTION
In conventional telephone services, speech is sampled at 8,000 samples per second (8 kHz), and each speech sample is represented by 8 bits using the ITU-T G.711 Pulse Code Modulation (PCM), resulting in a transmission bit-rate of 64,000 bits/second, or 64 kb/s for each voice conversation channel. The Plain Old Telephone Service (POTS) is built upon the so-called Public Switched Telephone Networks, (PSTN), which are circuit-switched networks designed to route millions of such 64 kb/s speech signals. Since telephone speech is sampled at 8 kHz, theoretically such 64 kb/s speech signal cannot carry any frequency component that is above 4 kHz. In practice, the speech signal is typically band-limited to the frequency range of 300 to 3,400 Hz by the ITU-T P.48 Intermediate Reference System (IRS) filter before its transmission through the PSTN. Such a limited bandwidth of 300 to 3,400 Hz is the main reason why telephone speech sounds thin, unnatural, and less intelligible compared with the full-bandwidth speech as experienced in face-to-face conversation.
In the last several years, there is a tremendous interest in the so-called “IP telephony”, i.e., telephone calls transmitted through packet-switched data networks employing the Internet Protocol (IP). Currently, the common approach is to use a speech encoder to compress 8 kHz sampled speech to a low bit rate, package the compressed bit-stream into packets, and then transmit the packets over IP networks. At the receiving end, the compressed bit-stream is extracted from the received packets, and a speech decoder is used to decode the compressed bit-stream back to 8 kHz sampled speech. The term “codec” (coder and decoder) is commonly used to denote the combination of the encoder and the decoder. The current generation of IP telephony products typically use existing speech codecs that were designed to compress 8 kHz telephone speech to very low bit rates. Examples of such codecs include the ITU-T G.723.1 at 6.3 kb/s, G.729 at 8 kb/s, and G.729A at 8 kb/s. All of these codecs have somewhat degraded speech quality when compared with the ITU-T 64 kb/s G.711 PCM and, of course, they all still have the same 300 to 3,400 Hz bandwidth limitation.
In many IP telephony applications, there is plenty of transmission capacity, so there is no need to compress the speech to a very low bit rate. Such applications include “toll bypass” using high-speed optical fiber IP network backbones, and “LAN phones” that connect to and communicate through Local Area Networks such as 100 Mb/s fast ethernets. In many such applications, the transmission bit rate of each channel can be as high as 64 kb/s. Further, it is often desirable to have a sampling rate higher than 8 kHz, so the output quality of the codec can be much higher than POTS quality, and ideally approaches CD quality, for both speech and non-speech signals, such as music. It is also desirable to have a codec complexity as low as possible in order to achieve high port density and low hardware cost per channel. Furthermore, it is desirable to have a coding delay as low as possible, so that users will not experience significant delay in two-way conversations. In addition, depending on applications, sometimes it is necessary to transmit the decoder output through PSTN. Therefore, the decoder output should be easy to down-sample to 8 kHz for transcoding to 8 kHz G.711. Clearly, there is a need to address the requirements presented by these and other applications.
The present invention is designed to meet these and other practical requirements by using an adaptive transform coding approach. Most prior art audio codecs based on adaptive transform coding use a single large transform (1024 to 2048 data points) in each processing frame. In some cases, switching to smaller transform sizes is used, but typically during transient regions of the signal. As known in the art, a large transform size leads to relatively high computational complexity and high coding delay which, as pointed above, are undesirable in many applications. On the other hand, if a single small transform is used in each frame, the complexity and coding delay go down, but the coding efficiency also go down, partially because the transmission of side information (such as quantizer step sizes and adaptive bit allocation) takes a significantly higher percentage of the total bit rate.
By contrast, the present invention uses multiple small-size transforms in each frame to achieve low complexity, low coding delay, and a good compromise in coding efficiently the side information. Many low-complexity techniques are used in accordance with the present invention to ensure that the overall codec complexity is as low as possible. In a preferred embodiment, the transform used is the Modified Discrete Cosine Transform (MDCT), as proposed by Princen et al., Proceedings of 1987 IEEE International Conference in Acoustics, Speech, and Signal Processing, pp. 2161-2164, the content of which is incorporated by reference.
In IP-based voice or audio communications, it is often desirable to support multiple sampling rates and multiple bit rates when different end points have different requirements on sampling rates and bit rates. A conventional (although not so elegant) solution is to use several different codecs, each capable of operating at only a fixed bit-rate and a fixed sampling rate. A serious disadvantage of this approach is that several completely different codecs have to be implemented on the same platform, thus increasing the total storage requirement for storing the programs for all codecs. Furthermore, if the application requires multiple output bit-streams at multiple bit-rates, the system needs to run several different speech codecs in parallel, thus increasing the overall computational complexity.
A solution to this problem in accordance with the present invention is to use scalable and embedded coding. The concept of scalable and embedded coding itself is known in the art. For example, the ITU-T has a G.727 standard, which specifies a scalable and embedded ADPCM codec at 16, 24 and 32 kb/s. Also available is the Philips proposal of a scalable and embedded CELP (Code Excited Linear Prediction) codec architecture for 14 to 24 kb/s [1997 IEEE Speech Coding Workshop]. However, both the ITU-T standard and the Phillips proposal deal with a single fixed sampling rate of 8 kHz. In practical applications this can be a serious limitation.
In particular, due to the large variety of terminal devices and communication links used for IP-based voice communications, it is generally desirable, and sometimes even necessary, to link communication devices with widely different operating characteristics. For example, it may be necessary to provide high-quality, high-bandwidth speech (at sampling rates higher than 8 kHz and bandwidths wider than the typical 3.4 kHz telephone bandwidth) for devices connected to a LAN, and at the same time provide telephone-bandwidth speech over PSTN to remote locations. Such needs may arise, for example, in tele-conferencing applications. Addressing such needs, the present invention is able to handle several sampling rates rather than a single fixed sampling rate. In terms of scalability in sampling rate and bit rate, the present invention is similar to co-pending application Ser. No. 60/059,610 filed Sep. 23, 1997, the content of which is incorporated by reference. However, the actual implementation methods are very different.
It should be noted that although the present invention is described primarily with reference to a scalable and embedded codec for IP-based voice or audio communications, it is by no means limited to such applications, as will be appreciated by those skilled in the art.
SUMMARY OF THE INVENTION
In a preferred embodiment, the system of the present invention is an adaptive

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Low-complexity, low-delay, scalable and embedded speech and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Low-complexity, low-delay, scalable and embedded speech and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Low-complexity, low-delay, scalable and embedded speech and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2977898

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.