Method for encoding speech wherein pitch periods are changed...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S207000, C704S220000, C704S208000

Reexamination Certificate

active

06427135

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a method for encoding speech at a low bit rate and, more particularly, to a method for encoding speech and a method for decoding speech wherein a speech signal including a background noise is encoded by compressing it efficiently in a state which is as close to the original speech as possible.
Further, the present invention relates to a method for encoding speech wherein a speech signal is compressed and encoded, and, more particularly, to speech encoding used for digital telephones and the like and a method for encoding speech for speech synthesis used for text read-out software and the like.
Conventional low-bit-rate speech coding is directed to efficient coding of a speech signal and is carried out according to speech coding methods which employ a model of a speech production process. Among such methods for speech coding, methods based on a CELP system have recently been spreading remarkably. When such a method for encoding speech on a CELP basis is used, a speech signal input in an environment having little background noise can be encoded efficiently because the signal matches the model for encoding, and this allows encoding with deterioration of speech quality at a relatively low level.
However, it is known that when a method for encoding speech on a CELP basis is used for a speech signal input under a condition where a background noise is at a high level, the background noise included in a reproduced output signal comes out very differently to produce speech which is very unstable and uncomfortable. Such a tendency is significant especially at an encoding bit rate of 8 kbps or less.
In order to mitigate this problem, a method has been proposed wherein the CELP encoding is performed using a more noisy excitation signal for a time window which has been determined to be a background noise to mitigate deterioration of speech quality in such a window of a background noise. Although such a method provides some improvement of speech quality in the window for a background noise, the improvement is problematically insufficient in that the tendency of producing a noise that sounds differently from the background noise in the original speech still remains because a model of a speech production process is used in which speech is synthesized by having the excitation signal passed through a synthesis filter.
As described above, the conventional method for encoding speech has a problem in that when a speech signal input under a condition where a background noise is at a high level is encoded, the background noise included in a reproduced output signal comes out very differently to produce speech which is very unstable and uncomfortable.
BRIEF SUMMARY OF THE INVENTION
It is an object of the present invention to provide a method for low-rate speech coding and decoding wherein speech including a background noise can be reproduced in a state as close to the original speech as possible.
It is another object of the invention to provide a method for a low-rate speech coding and decoding wherein a background noise can be encoded with a number of bits as small as possible to reproduce speech including a background noise in a state as close to the original speech as possible.
It is still another object of the invention to provide a method for encoding speech wherein encoding can be performed such that abrupt changes and fluctuations of pitch periods are reflected to obtain high quality decoded speech.
According to the present invention, there is provided a method for encoding speech comprising separating an input speech signal into a first component mainly constituted by speech and a second component mainly constituted by a background noise at each predetermined unit of time, selecting bit allocation for each of the first and second components from among a plurality of candidates for bit allocation based on the first and second components, encoding the first and second components under such bit allocation using predetermined different methods for encoding, and outputting data on the encoding of the first and second components and information on the bit allocation as encoded data to be transmitted.
According to the CELP encoding, as described above, when a speech signal input under a condition wherein a background noise is at a high level, the background noise included in a reproduced speech signal comes out very differently to produce speech which is very unstable and uncomfortable. This phenomenon is attributable to the fact that the background noise has a model which is completely different from that for speech signals to which CELP works well, and it is desirable to perform a background noise using a method appropriate for it.
According to the present invention, an input speech signal is separated into a first component mainly constituted by speech and a second component mainly constituted by a background noise at each predetermined unit of time, and encoding is performed using methods for encoding based on different models which are respectively adapted to the characteristics of the speech and background noise to improve the efficiency of the encoding as a whole.
The first and second components are encoded using bit allocation selected from among a plurality of candidates for bit allocation based on the first and second components such that each component can be more efficiently encoded. This makes it possible to encode the input speech signal efficiently with the overall bit rate kept low.
In the method for encoding according to the invention, the first component is preferably encoded in the time domain and the second component is preferably encoded in the frequency domain or transform domain. Specifically, since speech is information which quickly changes at relatively short intervals on the order of 10 to 15 ms, the first component mainly constituted by speech can be encoded with high quality by using a method such as the CELP type encoding which suppresses distortion of a waveform in the time domain. On the other hand, since a background noise slowly changes at relatively long intervals in the range from several tens ms to several hundred ms, the information of the second component mainly constituted by a background noise can be more easily extracted with less bits by encoding the components after converting them into parameters in the frequency domain or transform domain.
In the method for encoding speech according to the invention, the total number of bits for encoding that are allocated for the predetermined units of time is preferably fixed. Since this makes it possible to encode an input speech signal at a fixed bit rate, encoded data can be more easily processed.
Further, in the method for encoding speech according to the invention, it is preferable that a plurality of methods for encoding are provided for encoding the second component and that at least one of those method encodes the spectral shape of the current background noise utilizing the spectral shape of a previous background noise which has already been encoded. Since this method for encoding allows the second component to be encoded with a very small number of bits, resultant spare encoding bits can be allocated for the encoding of the first component to prevent deterioration of the quality of decoded speech.
When an input speech signal is encoded using the method for encoding based on models adapted respectively to the first component mainly constituted by speech and the second component mainly constituted by a background noise, although the production of an uncomfortable sound can be avoided. However, if the background noise is superimposed on the speech signal, i.e., if both of the first and second components separated from the input speech signal have power which can not be ignored, the absolute number of the bits for encoding the first component runs short and, as a result, the quality of the decoded speech is significantly reduced.
In such a case, with the above-described method for encoding the spectral shape of the current back ground noise utilizing th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for encoding speech wherein pitch periods are changed... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for encoding speech wherein pitch periods are changed..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for encoding speech wherein pitch periods are changed... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2891108

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.