Speech coding based on determining a noise contribution from...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S201000

Reexamination Certificate

active

06453283

ABSTRACT:

BACKGROUND OF THE INVENTION
The invention relates to a method of coding an audio equivalent signal. The invention also relates to an apparatus for coding an audio equivalent signal. The invention further relates to a method of synthesising an audio equivalent signal from encoded signal fragments.
The invention also relates to a system for synthesising an audio equivalent signal from encoded audio equivalent input signal fragments. The invention further relates to a synthesiser.
The invention relates to a parametric production model for coding an audio equivalent signal. A widely used coding technique based on a parametric production model is the so-called Linear Predictive Coding, LPC, technique. This technique is particularly used for coding speech. The coded signal may, for instance, be transferred via a telecommunications network and decoded (resynthesised) at the receiving station or may be used in a speech synthesis system to synthesise speech output representing, for instance, textual input. According to the LPC model the spectral energy envelope of an audio equivalent signal is described in terms of an optimum all-pole filter and a gain factor that matches the filter output to the input level. For speech, a binary voicing decision determines whether a periodic impulse train or white noise excites the LPC synthesis filter. For running speech the, model parameters, i.e. voicing, pitch period, gain and filter coefficients are updated every frame, with a typical duration of 10 msec. This reduces the bit rate drastically. Although a classical LPC vocoder can produce intelligible speech, it often sounds rather buzzy. LPC is based on autocorrelation analysis and simply ignores the phase spectrum. The synthesis is minimum phase. A limitation of the classical LPC is the binary selection of either a periodic or a noise source. In natural speech both sources often act simultaneously. Not only in voiced fricatives but also in many other voiced sounds. An improved LPC coding technique is known from “A mixed excitation LPC vocoder model for low bit rate speech coding”, McCree & Barnwell, IEEE Transactions on speech and audio processing, Vol. 3, No. 4, July 1995. According to this coding technique, a filter bank is used to split the input signal into a number of, for instance five, frequency bands. For each band, the relative pulse and noise power is determined by an estimate of the voicing power strength at that frequency in the input speech. The voicing strength in each frequency band is chosen as the largest of the correlation of the bandpass filtered input speech and the correlation of the envelope of the bandpass filtered speech. The LPC synthesis filter is excited by al frequency weighted sum of a pulse train and white noise.
In general the quality obtained by LPC is relatively low and therefore LPC is mainly used for communication purposes at low bitrates (e.g. 2400/4800 bps). Even the improved LPC coding is not suitable for systems, such as speech synthesis (text-to-speech), where a high quality output is desired. Using the LPC coding methods a great deal of naturalness is still lacking. This has hampered large scale application of synthetic speech in e.g. telephone services or automatic traffic information systems in a car environment.
SUMMARY OF THE INVENTION
It is an object of the invention to provide a parametric coding/synthesis method and system which enables the production of more natural speech.
To meet the object of the invention, the method of coding an audio equivalent signal comprises:
determining successive pitch periods/frequencies in the signal;
forming a sequence of mutually overlapping or adjacent analysis segments by positioning, a chain of time windows with respect to the signal and weighting the signal according to an associated window function of the respective time window;
for each of the analysis segments:
determining an amplitude value and a phase value for a plurality of frequency Components of the analysis segment, including a plurality of harmonic frequencies of the pitch frequency corresponding to the analysis segment,
determining a noise value for each of the frequency components by comparing the phase value for the frequency component of the analysis segment to a corresponding phase value for at least one preceding or following analysis segment; the noise value for a frequency component representing a contribution of a periodic component and an aperiodic component to the analysis segment at the frequency; and
representing the analysis segment by the amplitude value and the noise value for each of the frequency components.
The inventor has found that an accurate estimate of the ratio between noise and the periodic component is achieved by pitch synchronously analysing the phase development of the signal, instead of (or in addition to) analysing the amplitude development. This improved detection of the noise contribution can be used to improve the prior art LPC encoding. Advantageously, the coding is used for speech synthesis systems.
In an embodiment according to the invention as described in the dependent claim
2
, the, analysis window is very narrow. In this way, the relatively quick change of ‘noisiness’ which can occur in speech can be accurately detected.
In an embodiment according to the invention as decried in the dependent claim
3
, the pitch development is accurately determined using a two step approach. After obtaining a rough estimate of the pitch, the signal is filtered to extract the frequency components near the detected pitch frequency. The actual pitch is detected in the pitch filtered signal.
In an embodiment according to the invention as described in the dependent claim
4
, the filtering is based on convolution with a sine/cosine pair within a segment, which allows for an accurate determination of the pitch frequency component within the segment.
In an embodiment according to the invention as described in the dependent claim
5
, interpolation is used for increasing the resolution for sampled signals.
In an embodiment according to the invention as described in the dependent claim
6
, the amplitude and/or phase value of the frequency components are determined by a transformation to the frequency domain using the accurately determined pitch frequency as the fundamental frequency of the transformation. This allows for an accurate description of the periodic part of the signal.
In an embodiment according to the invention as described in the dependent claim
7
, the noise value is derived from the difference of the phase value for the frequency component of the analysis segment and the corresponding phase value of at least one preceding or following analysis segment. This is a simple way of obtaining a measure for how much noise is present at that frequency in the signal. If the signal is highly dominated by the periodic signal, with a very low contribution of noise, the phase will substantially be the same.,On the other hand for a signal dominated by noise, the phase will ‘randomly’ change. As such the comparison of the phase provides an indication for the contribution of the periodic and aperiodic components to the input signal. It will be appreciated that the measure may also be based on phase information from more than two segments (e.g. the phase information from both neighbouring segments may be compared to the phase of the current segment).
In an embodiment according to the invention as described in the dependent claim
8
, the noise value is based on a difference of a derivative of the phase value for the frequency component of the analysis segment and of the corresponding phase value of at least one preceding or following analysis segment. This provides a more robust measure.
To meet the object of the invention, the method of synthesising an audio equivalent signal from encoded audio equivalent input signal fragments, such as diphones, comprises:
retrieving selected ones of coded signal fragments, where the signal fragments have been coded according to the described coding method; and
for each of the retrieved coded signa

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech coding based on determining a noise contribution from... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech coding based on determining a noise contribution from..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech coding based on determining a noise contribution from... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2831081

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.