Codebook re-ordering to reduce undesired packet generation

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S236000, C704S238000

Reexamination Certificate

active

06754624

ABSTRACT:

BACKGROUND
1. Field
The disclosed embodiments relate generally to wireless communications, and more specifically to the field of signal processing.
2. Background
Transmission of voice by digital techniques has become widespread, particularly in long distance and digital radio telephone applications. This, in turn, has created interest in determining the least amount of information that can be sent over a channel while maintaining the perceived quality of the reconstructed speech. If speech is transmitted by simply sampling and digitizing, a data rate on the order of sixty-four kilobits per second (kbps) is required to achieve a speech quality of conventional analog telephone. However, through the use of speech analysis, followed by the appropriate coding, transmission, and re-synthesis at the receiver, a significant reduction in the data rate can be achieved.
Devices that employ techniques to compress speech by extracting parameters that relate to a model of human speech generation are called speech coders. A speech coder divides the incoming speech signal into blocks of time, or analysis frames. Hereinafter, the terms “frame” and “packet” are inter-changeable. Speech coders typically comprise an encoder and a decoder, or a codec. The encoder analyzes the incoming speech frame to extract certain relevant gain and spectral parameters, and then quantizes the parameters into binary representation, i.e., to a set of bits or a binary data packet. The data packets are transmitted over the communication channel to a receiver and a decoder. The decoder processes the data packets, de-quantizes them to produce the parameters, and then re-synthesizes the frames using the de-quantized parameters.
The function of the speech coder is to compress the digitized speech signal into a low-bit-rate signal by removing all of the natural redundancies inherent in speech. The digital compression is achieved by representing the input speech frame with a set of parameters and employing quantization to represent the parameters with a set of bits. If the input speech frame has a number of bits N
i
and the data packet produced by the speech coder has a number of bits N
o
, the compression factor achieved by the speech coder is C
r
=N
i
/N
o
. The challenge is to retain high voice quality of the decoded speech while achieving the target compression factor. The performance of a speech coder depends on (1) how well the speech model, or the combination of the analysis and synthesis process described above, performs, and (2) how well the parameter quantization process is performed at the target bit rate of N
o
bits per frame. The goal of the speech model is thus to capture the essence of the speech signal, or the target voice quality, with a small set of parameters for each frame.
Speech coders may be implemented as time-domain coders, which attempt to capture the time-domain speech waveform by employing high time-resolution processing to encode small segments of speech (typically 5 millisecond (ms) sub-frames) at a time. For each sub-frame, a high-precision representative from a codebook space is found by means of various search algorithms known in the art. Alternatively, speech coders may be implemented as frequency-domain coders, which attempt to capture the short-term speech spectrum of the input speech frame with a set of parameters (analysis) and employ a corresponding synthesis process to recreate the speech waveform from the spectral parameters. The parameter quantizer preserves the parameters by representing them with stored representations of code vectors in accordance with known quantization techniques described in A. Gersho & R. M. Gray,
Vector Quantization and Signal Compression
(1992). Different types of speech within a given transmission system may be coded using different implementations of speech coders, and different transmission systems may implement coding of given speech types differently. Typically, voiced and unvoiced speech segments are captured at high bit rates, and background noise and silence segments are represented with modes working at a significantly lower rate. Speech coders used in CDMA digital cellular systems employ variable bit-rate (VBR) technology, in which one of four data rates is selected every 20 ms, depending on the speech activity and the local characteristics of the speech signal. The data rates include full rate, half rate, quarter rate, and eighth rate. Typically, transient speech segments are coded at full rate. Voiced speech segments are coded at half rate, while silence and background noise (inactive speech) are coded at eighth rate, in which conventionally, only the spectral parameters and the energy contour of the signal are quantized at the lower bit rate.
For coding at lower bit rates, various methods of spectral, or frequency-domain, coding of speech have been developed, in which the speech signal is analyzed as a time-varying evolution of spectra. See, e.g., R. J. McAulay & T. F. Quatieri,
Sinusoidal Coding
, in
Speech Coding and Synthesis
ch. 4 (W. B. Kleijn & K. K. Paliwal eds., 1995). In spectral coders, the objective is to model, or predict, the short-term speech spectrum of each input frame of speech with a set of spectral parameters, rather than to precisely mimic the time-varying speech waveform. The spectral parameters are then encoded and an output frame of speech is created with the decoded parameters. The resulting synthesized speech does not match the original input speech waveform, but offers similar perceived quality. Examples of frequency-domain coders that are well known in the art include multiband excitation coders (MBEs), sinusoidal transform coders (STCs), and harmonic coders (HCs). Such frequency-domain coders offer a high-quality parametric model having a compact set of parameters that can be accurately quantized with the low number of bits available at low bit rates.
The process of encoding speech involves representing the speech signal using a set of parameters such as pitch, signal power gain, spectral envelope, amplitude, and phase spectra, which are then coded for transmission. The parameters are coded for transmission by quantizing each parameter and converting the quantized parameter values into bit-streams. A parameter is quantized by looking for the closest approximating value of the parameter from a predetermined finite set of codebook values. Codebook entries may be either scalar or vector values. The indices of the codebook entries most closely approximating the parameter values are packetized for transmission. At a receiver, a decoder employs a simple lookup technique using the transmitted indices to recover the speech parameters from an identical codebook in order to synthesize the original speech signal.
The speech encoding process may produce a binary packet for transmission containing any possible permutation of codebook indices, including a packet containing all ones. In existing CDMA systems, packets containing all ones are reserved for null traffic channel data. Null traffic channel data is generated at the physical layer when no signaling message is being transmitted. Null traffic channel data serves to maintain the connectivity between a user terminal and a base station. A user terminal may comprise a cellular telephone for mobile subscribers, a cordless telephone, a paging device, a wireless local loop device, a personal digital assistant (PDA), an Internet telephony device, a component of a satellite communication systems, or any other component device of a communications system. As defined in EIA/TIA/IS-95, null traffic channel data is equivalent to an eighth-rate packet with all bits set to one. Packets containing null traffic channel data are typically declared as erasures by speech decoders. Speech encoders must not allow a permutation of codebook indices representing quantized speech parameters to generate an illegal packet containing all ones, which is reserved for null traffic channel data. If an eighth-rate packet happens to be all ones after quantization, the encoder generally modifies

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Codebook re-ordering to reduce undesired packet generation does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Codebook re-ordering to reduce undesired packet generation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Codebook re-ordering to reduce undesired packet generation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3309637

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.