Method for coding speech containing noise-like speech...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S209000, C704S219000, C704S230000, C704S220000

Reexamination Certificate

active

06205423

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of communications, and more specifically, to the field of coded speech communications.
2. Description of Related Art
During a conversation between two or more people, ambient background noise is typically inherent to the overall listening experience of the human ear.
FIG. 1
illustrates the analog sound waves
100
of a typical recorded conversation that includes ambient background noise signal
102
along with speech groups
104
-
108
caused by voice communication. Within the technical field of transmitting, receiving, and storing speech communications, several different techniques exist for coding and decoding a signal
100
. One of the techniques for coding and decoding a signal
100
is to use an analysis-by-synthesis coding system, which is well known to those skilled in the art.
FIG. 2
illustrates a general overview block diagram of a prior art analysis-by-synthesis system
200
for coding and decoding speech. An analysis-by-synthesis system
200
for coding and decoding signal
100
of
FIG. 1
utilizes an analysis unit
204
along with a corresponding synthesis unit
222
. The analysis unit
204
represents an analysis-by-synthesis type of speech coder, such as a code excited linear prediction (CELP) coder. A code excited linear prediction coder is one way of coding signal
100
at a medium or low bit rate in order to meet the constraints of communication networks and storage capacities. An example of a CELP based speech coder is the recently adopted International Telecommunication Union (ITU) G.729 standard, herein incorporated by reference.
In order to code speech, the microphone
206
of the analysis unit
204
receives the analog sound waves
100
of
FIG. 1
as an input signal. The microphone
206
outputs the received analog sound waves
100
to the analog to digital (A/D) sampler circuit
208
. The analog to digital sampler
208
converts the analog sound waves
100
into a sampled digital speech signal (sampled over discrete time periods) which is output to the linear prediction coefficients (LPC) extractor
210
and the pitch extractor
212
in order to retrieve the formant structure (or the spectral envelope) and the harmonic structure of the speech signal, respectively.
The formant structure corresponds to short-term correlation and the harmonic structure corresponds to long-term correlation. The short-term correlation can be described by time varying filters whose coefficients are the obtained linear prediction coefficients (LPC). The long-term correlation can also be described by time varying filters whose coefficients are obtained from the pitch extractor. Filtering the incoming speech signal with the LPC filter removes the short-term correlation and generates a LPC residual signal. This LPC residual signal is further processed by the pitch filter in order to remove the remaining long-term correlation. The obtained signal is the total residual signal. If this residual signal is passed through the inverse pitch and LPC filters (also called synthesis filters), the original speech signal is retrieved or synthesized. In the context of speech coding, this residual signal has to be quantized (coded) in order to reduce the bit rate. The quantized residual signal is called the excitation signal, which is passed through both the quantized pitch and LPC synthesis filters in order to produce a close replica of the original speech signal. In the context of analysis-by-synthesis CELP coding of speech, the quantized residual is obtained from a code book
214
normally called the fixed code book. This method is
The fixed code book
214
of
FIG. 2
contains a specific number of stored digital patterns, which are referred to as code vectors. The fixed code book
214
is normally searched in order to provide the best representative code vector to the residual signal in some perceptual fashion as known to those skilled in the art. The selected code vector is typically called the fixed excitation signal. After determining the best code vector that represents the residual signal, the fixed code book unit
214
also computes the gain factor of the fixed excitation signal. The next step is to pass the fixed excitation signal through the pitch synthesis filter. This is normally implemented using the adaptive code book search approach in order to determine the optimum pitch gain and lag in a “closed-loop” fashion as known to those skilled in the art. The “closed-loop” method, or analysis-by-synthesis, means that the signals to be matched are filtered. The optimum pitch gain and lag enable the generation of a so-called adaptive excitation signal. The determined gain factors for both the adaptive and fixed code book excitations are then quantized in a “closed-loop” fashion by the gain quantizer
216
using a look-up table with an index, which is a well known quantization scheme to those of ordinary skill in the art. The index of the best fixed excitation from the fixed code book
214
along with the indices of the quantized gains, pitch lag and LPC coefficients are then passed to the storage/transmitter unit
218
.
The storage/transmitter
218
(of
FIG. 2
) of the analysis unit
204
then transmits to the synthesis unit
222
, via the communication network
220
, the index values of the pitch lag, pitch gain, linear prediction coefficients, the fixed excitation code vector, and the fixed excitation code vector gain which all represent the received analog sound waves signal
100
. The synthesis unit
222
decodes the different parameters that it receives from the storage/transmitter
218
to obtain a synthesized speech signal. To enable people to hear the synthesized speech signal, the synthesis unit
222
outputs the synthesized speech signal to a speaker
224
.
The analysis-by-synthesis system
200
described above with reference to
FIG. 2
has been successfully employed to realize high quality speech coders. As can be appreciated by those skilled in the art, natural speech can be coded at very low bit rates with high quality. The high quality coding at a low-bit rate can be achieved by using a fixed excitation code book
214
whose code vectors have high sparsity (i.e., with few non-zero elements). For example, there are only four non-zero pulses per 5 ms in the ITU Recommendation G.729. However, when the speech is noise-like such as unvoiced speech or is corrupted by ambient background noise, the perceived performance of these coding systems is degraded. This degradation can be remedied only if the fixed code book
214
contains high-density non-zero pseudo-random code vectors and if the waveform-matching criterion in CELP systems is relaxed.
Sophisticated solutions including multi-mode coding and the use of mixed excitations have been proposed to improve the speech quality of noise-like speech such as unvoiced speech or speech under background noise conditions. However, these solutions usually lead to undesirably high complexity or high sensitivity to transmission errors. The present invention provides a simple solution to combat this problem.
SUMMARY OF THE INVENTION
The present invention includes a system and method to improve the quality of coded speech when ambient background noise is present or the speech segment is noise-like such as occurs during unvoiced speech. For most analysis-by-synthesis speech coders, the pitch prediction contribution is meant to represent the periodicity of the speech during voiced segments. One embodiment of the pitch predictor is in the form of an adaptive code book, which is well known to those of ordinary skill in the art. For background noise segments or noise-like speech, such as unvoiced speech, there is a poor or even non-existent long-term correlation for the pitch prediction contribution to represent. However, the pitch prediction contribution is rich in sample content and therefore represents a good source for a desired pseudo-random sequence which is more suitable for background noise coding or noise-like speech coding.
Th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for coding speech containing noise-like speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for coding speech containing noise-like speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for coding speech containing noise-like speech... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2540705

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.