Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-01-07
2003-07-01
Banks-Harold, Marsha D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S221000, C704S227000
Reexamination Certificate
active
06587817
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to speech coding and in particular to forming of speech coding frames.
BACKGROUND OF THE INVENTION
A delay is generally a period between one event and another event connected with it. In mobile communication systems, a delay occurs between the transmission of a signal and its reception, the delay resulting from the interaction of a number of different factors, for example, from speech coding, channel coding and the propagation delay of the signal. Long response times produce an unnatural feeling in conversation and, therefore, a delay caused by the system always makes communication more difficult. Thus, the aim is to minimise the delay in each part of the system.
One source of a delay is windowing used in signal processing. The purpose of windowing is to shape the signal into a form required in further processing. For example, noise reducers typically used in mobile communication systems mainly operate in the frequency domain and, therefore, a signal to be noise-reduced is usually transformed frame by frame from the time domain to the frequency domain using a Fast Fourier Transform (FFT). In order that the FFT functions in the desired way, samples divided into frames should be windowed prior to the FFT.
FIG. 1
illustrates the procedure by showing as an example the windowing of a frame F(n) into a trapezoidal form. In windowing, the set of samples contained in the frame F(n) is multiplied by a window function so that a window W(n)
19
resulting from this comprises a first slope
10
(hereinafter referred to as the front slope), containing more recent samples of the frame, a second slope
11
(hereinafter referred to as the rear slope), containing older samples of the frame, and a remaining window part
12
in between them. In the windowing of the example, the samples of the window part
12
that locates between the first and second slopes are multiplied by 1, i.e. their value remains unchanged. The samples of the front slope
10
are multiplied by a descending function where the coefficient of the oldest samples of the front slope
10
approaches one and the coefficient of the newest samples approaches zero. Correspondingly, the samples of the rear slope
11
are multiplied by an ascending function where the coefficient of the oldest samples of the rear slope
11
approaches zero and the coefficient of the newest samples approaches one.
For the noise reduction of speech encoders, the noise reduction frame F(n) (reference
18
) is typically formed of an input frame
16
, formed of new samples, and of a set of the oldest samples
15
of the preceding input frame. Thus, samples
17
are used in forming two successive input frames.
FIG. 1
also illustrates the overlap-add method often used in connection with windowing relating to FFTs. In the method, part of the noise-reduced samples of successive windowed noise reduction frames are summed with each other to improve adjustments between consecutive frames. In the example shown in
FIG. 1
, the noise-reduced samples of slopes
10
and
13
of successive frames F(n) and F(n+1) are summed so that the data of the front slope
10
, calculated from the newer samples of the frame F(n), is summed sample by sample with the slope
13
, calculated from the older samples of the frame F(n+1), so that the sum of the coefficients of overlapping slopes is 1. Due to the overlap-add method, the section represented by the front slope
10
cannot, however, be transmitted further from noise reduction before noise reduction is performed for the entire following frame F(n+1) and neither can noise reduction of the next frame F(n+1) be started before the entire next frame is received. Thus, the use of the overlap-add method in the processing of a signal causes an additional delay D
1
, which is equal to the length of slope
10
.
The simplified block diagram in
FIG. 2
illustrates the phases of processing for a signal being formed of samples divided into frames, according to prior art. Block
21
represents the windowing of a frame, as presented above and block
22
represents the performance of noise reduction algorithms for windowed frames, comprising at least an FFT being performed on the windowed data and its reverse transformation. Block
23
represents the operations performed according to an overlap-add windowing wherein noise-reduced data is stored for the first slopes
10
,
14
of the window, to wait for the processing of the next frame and wherein the stored data is summed with the data of the second slopes
13
of the next frame. Block
24
represents speech-coding related signal pre-processing, which typically comprises high-pass filtering and signal scaling for speech coding. From block
24
, the data is transferred to a block
25
for speech coding.
Speech codecs (e.g. CELP, ACELP), used in current mobile phone systems, are based on linear prediction (CELP=Code Excited Linear Prediction). In linear prediction, a signal is encoded frame by frame. The data contained in the frames is windowed and on the basis of the windowed data, a set of auto-correlation coefficients is calculated, which are to be used to determine the coefficients of a linear prediction function to be used as coding parameters.
Lookahead is a known procedure used in data transmission, wherein typically newer data that does not belong to the frame to be processed are utilised, e.g. in a procedure applied to a speech frame. In some speech coding algorithms, such as algorithms according to the IS-641 standard specified by the Electronic Alliance/ Telecommunications Industry Association (EIA/TIA), linear prediction (LP) parameters for speech coding are calculated from a window that contains, in addition to the frame to be analysed, samples that belong to the preceding and following frame. The samples that belong to the following frame are called lookahead samples. A corresponding arrangement has also been proposed for use, e.g. in connection with Adaptive Multi Rate (AMR) codecs.
FIG. 3
illustrates lookahead as used in linear prediction according to the IS-641 standard. Each 20-ms long speech frame
30
is windowed into an asymmetric window
31
that also contains samples belonging to the preceding and following frame. The part of window
31
formed of newer samples is called the lookahead part
32
. An LP analysis is made once for each window. As can be seen in
FIG. 3
, windowing relating to lookahead causes an algorithmic delay D
2
in the signal corresponding to the length of the lookahead part
32
. Since the arrival of the signal for speech coding is already delayed by a period D
1
as a result of noise reduction windowing, the delay D
2
is summed with the previously described noise reduction additional delay D
1
.
SUMMARY OF THE INVENTION
According to the invention a method for generating a speech coding frames, the method comprising the steps of:
forming a series of partly overlapping first frames containing speech samples;
processing a first frame of the series of first frames by a first window function for producing a second, windowed, frame having a first slope;
performing noise reduction on the second frame for producing a third frame comprising noise reduced speech samples; and
forming a speech coding frame comprising noise-reduced samples of two successive third frames, at least partly summed with one another
characterised in that the method further comprises the steps of:
forming the speech coding frame so that it has a lookahead part that is formed at least partly of noise reduced speech samples of the first slope, these noise reduced speech samples of the first slope being not summed with any other noise reduced speech samples of the speech coding frame to be formed.
Advantageously, the above-described joint effect of algorithmic delays can be reduced by the invented method and an apparatus implementing the method.
Advantageously, by utilising windowing already performed in noise reduction in speech coding windowing, the algorithmic delays caused by processing phases are not sum
Paajanen Erkki
Vähätalo Antti
Banks-Harold Marsha D.
Nokia Mobile Phones Ltd.
Perman & Green LLP
Storm Donald L.
LandOfFree
Method and apparatus for determining speech coding parameters does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for determining speech coding parameters, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for determining speech coding parameters will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3095989