Electrical audio signal processing systems and devices – One-way audio signal program distribution – Public address system
Patent
1988-05-03
1989-09-05
Kemeny, Emanuel S.
Electrical audio signal processing systems and devices
One-way audio signal program distribution
Public address system
381 49, 381 51, G10L 500
Patent
active
048646210
DESCRIPTION:
BRIEF SUMMARY
This invention is concerned with speech coding, and more particularly to systems in which a speech signal can be generated by feeding the output of an excitation source through a synthesis filter. The coding problem then becomes one of generating, from input speech, the necessary excitation and filter parameters. LPC (linear predictive coding) parameters for the filter can be derived using well-established techniques, and the present invention is concerned with the excitation source.
Systems in which a voiced/unvoiced decision on the input speech is made to switch between a noise source and a repetitive pulse source tend to give the speech output an unnatural quality, and it has been proposed to employ a single "multipulse" excitation source in which a sequence of pulses is generated, no prior assumptions being made as to the nature of the sequence. It is found that, with this method, only a few pulses (say 8 in a 10 ms frame) are sufficient for obtaining reasonable results. See B S Atal and J R Remde: "A New Model of LPC Excitation for producing Natural-sounding Speech at Low Bit Rates", Proc. IEEE ICASSP, Paris, pp. 614, 1982.
Coding methods of this type offer considerable potential for low bit rate transmission--eg 9.6 to 4.8K bit/s.
The coder proposed by Atal and Remde operates in a "trial and error feedback loop" mode in an attempt to define an optimum excitation sequence which, when used as an input to an LPC synthesis filter, minimizes a weighted error function over a frame of speech. However, the unsolved problem of selecting an optimum excitation sequence is at present the main reason for the enormous complexity of the coder which limits its real time operation.
The excitation signal in multipulse LPC is approximated by a sequence of pulses located at non-uniformly spaced time intervals. It is the task of the analysis by synthesis process to define the optimum locations and amplitudes of the excitation pulses.
In operation, the input speech signal is divided into frames of samples, and a conventional analysis is performed to define the filter coefficients for each frame. It is then necessary to derive a suitable multipulse excitation sequence for each frame. The algorithm proposed by Atal and Remde forms a multipulse sequence which, when used to excite the LPC synthesis filter, minimises (that is, within the constraints imposed by the algorithm) a mean-squared weighted error derived from the difference between the synthesised and original speech. This is illustrated schematically in FIG. 1. Input speech is supplied to a unit DE which derives LPC filter coefficients. These are fed to determine the response of a local filter or synthesiser LF whose input is supplied with the output of a multipulse excitation generator EG. Synthetic speech at the output of the filter is supplied to a subtractor S to form the difference between the synthetic and input speech. The difference or error signal is fed via a perceptual weighting filter WF to error minimisation stage EM which controls the excitation generator EG. The positions and amplitudes of the excitation pulses are encoded and transmitted together with the digitized values of the LPC filter coefficients. At the receiver, given the decoded values of the multipulse excitation and the prediction coefficients, the speech signal is recovered at the output of the LPC synthesis filter.
In FIG. 1 it is assumed that a frame consists of n speech samples, the input speech samples being s.sub.o . . . s.sub.n-l and the synthesised samples so.sup.' . . . s.sub.n-l ', which can be regarded as vectors s, s'. The excitation consists of pulses of amplitude a.sub.m which are, it is assumed, permitted to occur at any of the n possible time instants within the frame, but there are only a limited number of them (say k). Thus the excitation can be expressed as an n-dimensional vector a with components a.sub.o . . . a.sub.n-l, but only k of them are non-zero. The objective is to find the 2k unknowns (k amplitudes, k pulse positions) which minimise the error: error signal such 
REFERENCES:
patent: 4709390 (1987-11-01), Atal et al.
British Telecommunications public limited company
Kemeny Emanuel S.
LandOfFree
Method of speech coding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of speech coding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of speech coding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-250249