Multiband harmonic transform coder

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S230000, C704S265000

Reexamination Certificate

active

06377916

ABSTRACT:

TECHNICAL FIELD
The invention is directed to encoding and decoding speech or other audio signals.
BACKGROUND
Speech encoding and decoding have a large number of applications and have been studied extensively. In general, speech coding, which is often referred to as speech compression, seeks to reduce the data rate needed to represent a speech signal without substantially reducing the quality or intelligibility of the speech. Speech compression techniques may be implemented by a speech coder.
A speech coder is generally viewed as including an encoder and a decoder. The encoder produces a compressed stream of bits from a digital representation of speech, which may be generated by using an analog-to-digital converter to sample and digitize an analog speech signal produced by a microphone. The decoder converts the compressed bit stream into a digital representation of speech that is suitable for playback through a digital-to-analog converter and a speaker. In many applications, the encoder and decoder are physically separated, and the bit stream is transmitted between them using a communication channel. Alternatively, the bit stream may be stored in a computer or other memory for decoding and playback at a later time.
A key parameter of a speech coder is the amount of compression the coder achieves, which is measured by the bit rate of the stream of bits produced by the encoder. The bit rate of the encoder is generally a function of the desired fidelity (i.e., speech quality) and the type of speech coder employed. Different types of speech coders have been designed to operate at different bit rates. Medium to low rate speech coders operating below 10 kbps (kilobits per second) have received attention with respect to a wide range of mobile communication applications, such as cellular telephony, satellite telephony, land mobile radio, and in-flight telephony. These applications typically require high quality speech and robustness to artifacts caused by acoustic noise and channel noise (e.g., bit errors).
A well known approach for coding speech at medium to low data rates is based around linear predictive coding (LPC), which attempts to predict each new frame of speech from previous samples using short and/or long term predictors. The prediction error is typically quantized using one of several approaches of which CELP and/or multi-pulse are two examples. The linear prediction method has good time resolution, which is helpful for the coding of unvoiced sounds. In particular, plosives and transients benefit from the time resolution in that they are not overly smeared in time. However, linear prediction often has difficulty for voiced sounds, since the coded speech tends to sound rough or hoarse due to insufficient periodicity in the coded signal. This is particularly true at lower data rates, which typically require a longer frame size and employ a long-term predictor that is less effective at reproducing the periodic portion (i.e., the voiced portion) of speech.
Another well known approach for low to medium rate speech coding is a model-based speech coder, which is often referred to as a vocoder. A vocoder usually models speech as the response of some system to an excitation signal over short time intervals. Examples of vocoder systems include linear prediction vocoders, such as MELP or LPC-10, homomorphic vocoders, channel vocoders, sinusoidal transform coders (“STC”), harmonic vocoder and multiband excitation (“MBE”) vocoders. In these vocoders, speech is divided into short segments (typically 10-40 ms), and each segment is characterized by a set of model parameters. These parameters typically represent a few basic elements of each speech segment, such as the segment's pitch, voicing state, and spectral envelope. A vocoder may use one of a number of known representations for each of these parameters. For example, the pitch may be represented as a pitch period, a fundamental frequency, or a long-term prediction delay. Similarly, the voicing state may be represented by one or more voicing metrics, by a voicing probability measure, or by a ratio of periodic to stochastic energy. The spectral envelope is often represented by an all-pole filter response, but also may be represented by a set of spectral magnitudes, cepstral coefficients, or other spectral measurements.
Since they permit a speech segment to be represented using only a small number of parameters, model-based speech coders, such as vocoders, typically are able to operate at lower data rates. However, the quality of a model-based system is dependent on the accuracy of the underlying model. Accordingly, a high fidelity model must be used if these speech coders are to achieve high speech quality.
One vocoder which has been shown to work well for certain types of speech is the harmonic vocoder. The harmonic vocoder is generally able to accurately model voiced speech, which is generally periodic over some short time interval. The harmonic vocoder represents each short segment of speech with a pitch period and some form of vocal tract response. Often, one or both of these parameters are converted into the frequency domain, and represented as a fundamental frequency and a spectral envelope. A speech segment can be synthesized in a harmonic vocoder by summing a sequence of harmonically related sine waves having frequencies at multiples of the fundamental frequency and amplitudes matching the spectral envelope. Harmonic vocoders often have difficult handling unvoiced speech, which is not easily modeled with a sparse collection of sine waves. Early harmonic vocoders handled unvoiced speech indirectly, without the use of any explicit voicing information, through a residual signal computed from the difference between the original speech and the harmonically-modeled speech. This residual signal was coded along with the model parameters, which lead to a relatively high total bit rate, or it was dropped, which led to relatively low quality. In another approach, a single voiced/unvoiced decision was used for an entire frame, with model parameters being added for voiced frames and the spectrum being coded for unvoiced frames. Problems with this approach resulted from the insufficiency of a single voicing decision for the entire frame (many segments of speech are voiced in some regions while being unvoiced in other regions), and from the sensitivity of the system to a voicing error which would negatively affect the entire frame. Previous harmonic coding schemes also suffered from the need to code the harmonic phases for voiced speech, and from not using critically sampled spectral representations for the unvoiced speech. These limitations reduced the number of bits available to code the other parameters, such as the harmonic magnitudes. As a result, the frame sizes were increased to around 30 ms to ensure that sufficient bits were available for all of the parameters at a reasonable total bit rate. Unfortunately, the use of a large frame size decreased time resolution in the system, which limited performance for unvoiced sounds and transients.
One improvement to early harmonic vocoders was introduced in the form of the Multiband Excitation (MBE) speech model. This model combines a harmonic representation for voiced speech with a flexible, frequency-dependent voicing structure that allows it to produce natural sounding unvoiced speech, and which makes it more robust to the presence of acoustic background noise. These properties allow the MBE model to produce higher quality speech at low to medium data rates, and have led to its use in a number of commercial mobile communication applications.
The MBE speech model represents segments of speech using a fundamental frequency representing the pitch, a set of binary voiced/unvoiced (V/UV) decisions or other voicing metrics, and a set of spectral magnitudes representing the frequency response of the vocal tract. The MBE model generalizes the traditional single V/UV decision per segment into a set of decisions, each representing the voicing state within a particular frequency band or region. Eac

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multiband harmonic transform coder does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multiband harmonic transform coder, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiband harmonic transform coder will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2852705

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.