Method of speech recognition

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06246980

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention concerns the field of automatic speech recognition.
A system of speech recognition includes two main functional units: a parametrization unit and a recognition unit. To these there is often added a learning unit serving to construct the dictionary of references used by the recognition unit.
The parametrization unit calculates relevant parameters on the basis of speech signals picked up by a microphone. These calculations are carried out according to a parametric representation chosen in order to differentiate vocal forms in the best possible way, separating the semantic information contained in the speech from the aesthetic information peculiar to diction. Cepstral representations constitute an important class of such representations (see EP-A-0 621 582).
The recognition unit makes the association between an observed segment of speech, represented by the parameters calculated by the parametrization unit, and a reference for which another set of parameters is stored in a dictionary of references. The sets of parameters stored in the dictionary in association with the different references can define deterministic models (they are for example composed directly of vectors coming from the parametrization unit). But most often, in order to take into account the variability of speech production and of the acoustic environment, sets of parameters which characterise stochastic models are rather used. Hidden Markov models (HMM) constitute an important class of such models. These stochastic models make it possible, by searching out the maximum likelihood, to identify the model which takes into account in the best way the observed sequence of parameters, and to select the reference associated with this model (see L. R. RABINER: “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”; Proceedings of the IEEE, Vol. 77, No. 2, February 1989, pages 257-285).
In general, the recognition of a word or a speech segment is not limited to searching for the maximum likelihood. One or more other likelihood criteria are examined to determine if the optimum model, presenting the maximum likelihood, should in fact be selected. This criterion is for example that the maximised likelihood exceeds a certain threshold. If the criterion is verified, the optimum model is selected and the recognition unit provides a result.
Otherwise, several solutions can be used: a first solution is to ask the speaker to provide confirmation that the speech segment uttered corresponds to the reference associated with the optimum model, or to one of the references associated with the n models for which the likelihoods are greatest (see EP-A-0 651 372). The user then has to carry out special manipulations in order to validate his choice, which is not ergonomic, especially for applications in hands-free mode.
Another solution is to ask the speaker to repeat what he has just said. If the criterion of likelihood is verified by the optimum model proposed as a result of the recognition test carried out on this repetition, the recognition terminates. In the contrary case, another repetition is requested, etc. This second solution is not very well suited to noisy environments or environments that are disrupted by multiple speakers: the noise interrupting the first pronunciation and causing the non-verification of the likelihood criterion will often interrupt the repetition, thus causing a further non-verification of the criterion, in such a way that the user finds himself forced to repeat the same word several times without success. If an attempt is made to overcome this disadvantage by adopting a less severe criterion of likelihood, the system tends to make numerous false starts in noisy environments.
EP-A-0 573 301 describes a method in which, in order to establish the ranking on which the recognition is based, the a priori probability of pronunciation of a word associated with a reference is replaced, after the repetition by the speaker, by the conditional probability of pronunciation of this word knowing that the same word has been said twice. The conditional probability is calculated with the aid of a development in accordance with Bayes theorem. This method thus seeks to refine the absolute values of the recognition scores of different entries in the dictionary of references.
An object of the present invention is to propose an effective solution for recognising speech in ambiguous cases.
SUMMARY OF THE INVENTION
The invention proposes a method of speech recognition which implements tests of recognition, wherein each test of recognition matches a segment of speech provided to the system with at least one set of parameters memorised in a dictionary of references. The method comprises the steps of :
applying the recognition test to a segment of speech uttered by a speaker;
examining whether a first optimum set of parameters, with which the recognition test has matched the spoken segment of speech, satisfies a criterion of likelihood;
if the criterion of likelihood is satisfied by the first optimum set of parameters, selecting the first optimum set of parameters;
if the criterion of likelihood is not satisfied by the first optimum set of parameters, requesting the speaker to repeat the speech segment;
applying the recognition test to the segment of speech repeated by the speaker;
examining whether a second optimum set of parameters, with which the recognition test has matched the repeated segment of speech, satisfies the criterion of likelihood;
if the criterion of likelihood is satisfied by the second optimum set of parameters, selecting the second optimum set of parameters;
if the criterion of likelihood is not satisfied by the second optimum set of parameters and if a combination of the results of the two recognition tests satisfies at least one criterion of combined selection, selecting one of the sets of parameters which the two recognition tests have matched with the spoken segment of speech and with the repeated segment of speech.
Although the criteria of likelihood, applied separately to two observations of the same word or the same speech segment, may be insufficient for the recognition to be conclusive, it will often be possible to make an adequate decision by combining these two observations and by examining other criteria in relation to the combination. The invention takes advantage of this to improve the rates of recognition for a given probability of a false start, or to reduce the probability of a false start for a given rate of recognition.
In a typical embodiment, each recognition test provides a list of n≧1 sets of parameters of the dictionary, which present the greatest likelihoods taking into account the observation of the speech segment submitted to the test, this list being arranged in order according to decreasing likelihood. Each of said first and second optimum sets of parameters is then a head of list.
The criteria of combined selection which may be used can include:
the identity of the first and second optimum sets of parameters,
when the dictionary contains at least one set of rejection parameters, with n≧2, the fact that a same set of parameters, other than a set of rejection parameters, appears on the one hand at the top of the list provided by one of the two recognition tests, and on the other hand in the second position, after a set of rejection parameters, in the list provided by the other of the two recognition tests.
In applying criteria of this sort, advantage is taken of the fact that the rankings provided by the individual recognition tests are relatively reliable. The above criteria are based on these rankings rather than on the absolute values of the scores, the refinement of which, with the aid of Bayes theorem or some other weighting formula, does not necessarily provide more frequent recognition, particularly in noisy environments.


REFERENCES:
patent: 4718094 (1988-01-01), Bahl et al.
patent: 4783803 (1988-11-01), Baker et al.
patent: 4783804 (1988-11-01), Juang et al.
patent: 4881266 (1989-11-01), Nitta

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of speech recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of speech recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of speech recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2476336

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.