Speech analysis using multiple noise compensation

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S233000

Reexamination Certificate

active

06377918

ABSTRACT:

This invention relates to a speech analysis system for processing speech which is subject to different forms of distortion. It is particularly (although not exclusively) relevant to recognition of words, languages or speakers in two way telephone conversations.
The problem to which the invention is addressed may be illustrated in one aspect by automatic speech recognition technology as used in telephone systems. Here the system's performance is often severely degraded by changes in a speech signal due to the position of the telephone handset or by the characteristics of the handset, telephone line and exchange. Attempts may be made to compensate for the problem by using some form of automatic gain control (AGC). Unfortunately this may be difficult to implement. For example, in two way telephone conversations in which the apparatus is connected using a two wire configuration, there are often substantial differences between the intensity levels of the speech signals of the persons speaking to one another. Using more sophisticated technology it is possible to intercept a call at a local exchange and to obtain separate signals from each telephone instrument. While this offers some improvement it does not address the difficult problem of reverse channel echo, which arises from contamination of the speech of one party to the conversation with that of the other.
The problem is not limited to differences in speech level. Many speech recognition systems attempt to adapt in some manner to the characteristics of the individual speaker or microphone. If speaker characteristics change frequently, compensation becomes very difficult.
Various methods are known for improving recognition performance by compensating for distortion or speaker characteristics. Current speech recognition systems convert the input signal from a waveform in the time domain into successive vectors in the frequency domain during a process sometimes known as “filterbank analysis”. These vectors are then matched to models of the speech signal. In some systems the vectors undergo a transformation prior to matching to speech models. It is possible to counteract signal distortion and speaker effects by applying some form of compensation to the vectors before transformation and matching. There are a number of known methods for determining the appropriate compensation. One such method is disclosed by Sadaoki Furui, “Cepstral Analysis Technique for Automatic Speaker Verification”, IEEE Trans Acoustics, Speech and Signal processing, 29(2):254-272, April 1981. It involves averaging data obtained by filterbank analysis over an entire conversation to obtain the long term spectral characteristics of a signal and applying a compensation for distortions during a second pass over the data. The compensated data is then passed to a speech recognition device for matching to speech models.
There are two main problems with this approach when applied to multi-speaker speech signals or single speaker speech signals where the form of distortion changes. First, since a single correction is applied for the entire conversation it is poorly suited to conversations in which the speaker characteristics change frequently. This may happen during telephone conversations or other dialogues. Secondly, it is necessary to process the entire conversation to obtain the appropriate correction before recognition commences, which makes it unsuitable for real time applications.
A preferable approach is to use a technique sometimes known as spectral shape adaptation (SSA). A recognition system using this technique provides information on the expected spectral characteristics of the signal to be recognised at each time instant, and this is compared to the equivalent actually present in that signal to provide a difference term. The difference term is then averaged over a number of successive signals (time averaging) to provide a correction term.
A system of this kind has been described by Yunxin Zhao, “Iterative Self-Learning Speaker and Channel Adaptation under Various Initial Conditions”, Proc IEEE ICASSP [11] pages 712-715. Here data is processed on a sentence by sentence basis. An input signal undergoes filterbank analysis to create successive vectors each indicating the variation in signal energy over a number of frequency bands. The vectors are processed by matching to speech model states. The parameters of the model state to which a vector has been matched are used to predict a value for that vector which would be expected according to the model. The difference between the vector and the predicted value is computed and time averaged with difference values obtained for earlier vectors from the sentence to determine the average distortion suffered by each sentence. The SSA parameters determined for one sentence are then used to process the next sentence.
Zhao's approach is unfortunately not appropriate where there are two or more speakers or forms of distortion because it can result in SSA parameters derived from speech of one speaker or subject to a particular form of distortion being applied in connection with a different speaker or form of distortion.
It is an object of the invention to provide a speech analysis system arranged to counteract multiple forms of distortion.
The present invention provides a speech analysis system for processing speech which has undergone distortion, and including compensating means for modifying data vectors obtained from speech to compensate for distortion, matching means for matching modified data vectors to models, and deriving means for deriving distortion compensation from data vectors for use by the compensating means; characterised in that:
a) the compensating means is arranged to compensate for a plurality of forms of distortion by modifying each data vector with a plurality of compensations to provide a respective set of modified data vectors compensated for respective forms of distortion,
b) the matching means is arranged to indicate the modified data vector in each set exhibiting the greatest matching probability and the form of distortion for which it has been compensated, and
c) the deriving means is arranged to derive compensation on the basis of the modified data vector in each set exhibiting greatest matching probability for use by the compensating means in compensating for the form of distortion for which that modified data vector was compensated.
The invention provides the advantage that compensation differentiates between forms of distortion so that the likelihood of correct speech analysis is improved.
The invention may be arranged to analyse speech from a plurality of speech sources each associated with a respective form of distortion, and wherein:
a) the compensating means is arranged to provide modified data vectors in each set compensated for distortion associated with respective speech sources,
b) the matching means is arranged to implement models divided into classes associated with speech and non-speech, and to indicate the model class associated with the modified data vector in each set exhibiting the greatest matching probability, and
c) the deriving means is arranged to derive a compensation from modified data vectors associated with speech class models.
The system of the invention may be arranged to update non-speech models within the matching means. The matching means may be arranged to identify the modified data vector in each set exhibiting the greatest matching probability taking into account earlier matching and speech recognition constraints, in order to assess matching probability over a sequence of data vectors.
The deriving means may be arranged to derive a compensation by averaging over a contribution from the modified data vector in each set exhibiting the greatest matching probability and the model with which it is matched and preceding contributions of like kind. Averaging may be carried out with by infinite impulse response filtering means.
The matching means may be arranged to implement hidden Markov model matching based on speech models with states having mat

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech analysis using multiple noise compensation does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech analysis using multiple noise compensation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech analysis using multiple noise compensation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2906678

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.