Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-09-14
2003-03-18
Knepper, David D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S208000, C704S220000
Reexamination Certificate
active
06535847
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to audio signal processing. It has particular utility in relation to the separation of voiced speech and unvoiced speech in low bit-rate speech coders.
2. Related Art
Low bit-rate speech coders are becoming increasingly commercially important as they enable a more efficient utilisation of the portion of the radio spectrum available to mobile phones.
Speech can be classified into three parts—voiced speech, unvoiced speech and silence. Any one of these may be corrupted by the addition of background noise. On a timescale of milliseconds, voiced speech can be viewed as a succession of repeated waveforms. This fact is exploited in a class of speech coding methods known as Prototype Waveform Interpolation (PWI) Methods. Essentially, these methods involve sending information describing repeated pitch period waveforms only once, thereby reducing the amount of bits required to encode the speech signal. Initial PWI speech coding methods only encoded voiced speech, the other portions of the speech signal were coded using other methods (e.g. Code Excited Linear Prediction methods). One example of such a hybrid coding technique is described in “Encoding Speech Using Prototype Waveforms”, W. B. Kleijn, IEEE Transaction on Speech and Audio Processing Vol. 1, pp. 386-399, October 1993.
Later PWI methods were generalised so as to enable unvoiced speech and noise to be encoded as well. An example of such a method is described in “A General Waveform-Interpolation Structure for Speech Coding”, W. B. Kleijn and J. Haagen, Signal Processing Theories and Applications, M. Hoit, C. Cowan, P. Grant, W. Sandham (Eds.), p1665-1668, 1994.
However, such coders have drawbacks in that the reconstituted speech sounds buzzy. The present inventors have established that the cause of this ‘buzziness’ is a poor separation of the voiced components of speech and the unvoiced
oisy components of speech.
BRIEF SUMMARY OF THE INVENTION
According to a first aspect of the present invention there is provided a method of extracting one of a concordant component and a discordant component of a predetermined segment of an audio signal, said method comprising the steps of:
forming an initial evolution surface from a series of combined magnitude and phase spectra representing segments of said signal around said predetermined segment;
modifying said initial evolution surface to obtain a modified evolution surface representing said one of the concordant component or the discordant component of said signal; and
extracting said one of the concordant component or the discordant component of said predetermined segment from said modified evolution surface;
wherein said modifying step involves:
a plurality of component filtering steps and, prior to at least one of those filtering steps, the substitution of phase information derived from said initial evolution surface or an earlier one of the component steps for the phase information derived from the most recent component step.
Here, concordant is intended to refer to signals whose phase changes slowly in comparison to discordant signals whose phase changes more rapidly.
The present inventors have found that the rate of evolution of the phase information is useful in distinguishing between voiced speech (the concordant component of speech) and unvoiced speech
oise (the discordant component of speech).
However, it is likely that the invention will find application in other areas of audio signal processing such as the enhancement of noise-corrupted speech or music signals.
Conventional low-pass and high-pass Finite Impulse Response (FIR) digital filtering techniques do not reduce the magnitude of discordant and concordant signals respectively to zero. Therefore, they are limited in how well they can extract one of the concordant or discordant components of an audio signal.
A conventional FIR filter might be approximated by a series of shorter FIR filters. By decomposing a filtering process into a plurality of filtering stages and, in one or more of the intervals between those filtering stages, substituting phase information from an earlier stage for phase information from the most recent stage, a filtering process results which repeatedly uses the earlier phase information. Filtering a signal tends to smooth its phase and hence a filtered signal contains less information distinguishing its concordant and discordant parts. By reinstating the earlier phase information, the concordant or discordant component can be more thoroughly removed in the subsequent filtering stage(s). The result is a audio signal filtering process which is better able to extract a concordant or discordant component of an audio signal.
As suggested above, a repeated application of a low-pass filter will leave a modified evolution surface representing the concordant component of said predetermined segment. Preferably, each low-pass filtering step involves the application of an identical low-pass filter. This minimises the. complexity of the processing method.
In preferred embodiments, the phase information derived from the initial evolution surface is used in all of said component steps. This maximises the effectiveness of the extraction method.
One way in which the discordant component can be calculated is to calculate the concordant component according to the first aspect of the present invention and subtract this from the original signal. Similarly, one way in which the concordant component can be calculated is to calculate the discordant component according to the first aspect of the present invention and subtract this from the original signal.
According to a second aspect of the present invention, there is provided an audio signal processor operable to extract one of a concordant component and a discordant component of a predetermined segment of an audio signal, said apparatus comprising:
means arranged in operation to form an initial evolution surface from a series of combined magnitude and phase spectra representing segments of said signal around said predetermined segment;
means arranged in operation to modify said initial evolution surface to obtain a modified evolution surface representing said one of the concordant component or the discordant component of said signal; and
means arranged in operation to extract said one of the concordant component or the discordant component of said predetermined segment from said modified evolution surface;
wherein said apparatus further comprises:
means arranged in operation to carry out a plurality of filtering steps and, prior to at least one of those filtering steps, to substitute phase information derived from said initial evolution surface or an earlier one of the component steps for the phase information derived from the most recent component step.
According to a third aspect of the present invention, there is provided a speech coding apparatus including:
a storage medium having recorded therein processor readable code processable to encode input speech data, said code including:
initial evolution surface generation code processable to generate initial evolution surface data comprising combined magnitude and phase data for segments of said input speech data;
separation code processable to derive separate phase data and magnitude data from said input speech data;
evolution surface modification code processable to generate a modified evolution surface representing one of a voiced component or an unvoiced
oise component of said input speech data; and
component extraction code processable to extract said one of the voiced component or the unvoiced
oise component from said input speech data;
wherein said evolution surface modification code comprises:
evolution surface filtering code processable to filter said initial evolution surface data a plurality of times;
evolution surface decomposition code processable to derive magnitude data and phase data subsequent to one or more of said filtering steps; and
earlier phase reinstatement code processable to replace the phase data obtained on p
Azad Abul K.
British Telecommunications public limited company
Knepper David D.
Nixon & Vanderhye P.C.
LandOfFree
Audio signal processing does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Audio signal processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Audio signal processing will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3005614