Method of estimating probabilities of occurrence of speech...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method of estimating probabilities of occurrence of speech... Method of estimating probabilities of occurrence of speech...

: 1999-09-14
: 2001-11-06
: {haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2641)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Recognition

: C704S255000
: Reexamination Certificate
: active
: 06314400
: ABSTRACT:

BACKGROUND OF THE INVENTION
The invention relates to a method of estimating probabilities of occurrence of speech vocabulary elements in a speech recognition system.
In speech recognition systems based on static models, an acoustic speech modeling and a linguistic speech modeling are used. The invention relates to the field of linguistic speech modeling.
It is known to determine probabilities of occurrence of elements of a speech vocabulary by a linear combination of different M-gram probabilities of these elements. It is known from R. Kneser, V. Steinbiss, “On the dynamic adaptation of stochastic language models”, Proc. ICASSP, pp. 586-589, 1993 that, for forming probabilities of occurrence of bigram vocabulary elements, a plurality of probabilities of occurrence determined for different training vocabulary corpuses of these bigram vocabulary elements is combined linearly so as to form probabilities of occurrence of these elements.
It is also known from R. Kneser, J. Peters and D. Klakow, “Language Model Adaptation using Dynamic Marginals”, (see formulas (8) and (9)), EUROSPEECH, pp. 1971-1974, 1997 that, in the estimation of the probability of occurrence of a speech vocabulary element, an M-gram probability with M>1 estimated by means of a first training vocabulary corpus for the speech vocabulary element is multiplied by a quotient raised to the power by means of an optimized parameter value, which optimized parameter value is determined by means of the GIS algorithm (Generalized Iterative Scaling), and a unigram probability of the element estimated by means of a second training vocabulary corpus serves as a dividend of the quotient, and a unigram probability of the element estimated by means of the first training vocabulary corpus serves as a divisor of the quotient. The error rate and the perplexity of a speech recognition system can be reduced with this formulation.
SUMMARY OF THE INVENTION
It is an object of the invention to provide further alternatives by modification of the linguistic speech modeling, with which alternatives the error rate and perplexity of a speech recognition system can be reduced.
This object is solved in that, in the estimation of a probability of occurrence of a speech vocabulary element, several M-gram probabilities of this element are raised to a higher power by means of an M-gram-specific optimized parameter value, and the powers thus obtained are multiplied by each other.
Suitable as M-grams are, for example, unigrams, bigrams, gap bigrams or trigrams comprising the relevant speech vocabulary element. The solution according to the invention is based on the formulation of minimizing the Kullback-Leibler distance as regards the different M-gram probabilities and the resultant probability of occurrence by means of the described combination of M-gram probabilities. The invention provides an effective combination of known linguistic speech models which are determined by the probability (of occurrence) of the corresponding vocabulary elements. This leads to probabilities of the speech vocabulary elements which are better adapted to the selected field of application, and to an improved linguistic speech model for the speech recognition system.
The following case does not fall within the protective scope of the invention (“disclaimer”). In the estimation of the probability of occurrence of a speech vocabulary element, an M-gram probability with M>1 estimated by means of a first training vocabulary corpus for the speech vocabulary element is multiplied by a quotient raised to a higher power by means of an optimized parameter value, which optimized parameter value is determined by means of the GIS algorithm, and a unigram probability of the element estimated by means of a second training vocabulary corpus serves as a dividend of the quotient, and a unigram probability of the element estimated by means of the first training vocabulary corpus serves as a divisor of the quotient.
This case, which does not fall within the protective scope of the invention, is already known from the article by R. Kneser, J. Peters and D. Klakow, “Language Model Adaptation using Dynamic Marginals”, EUROSPEECH, pp. 1971-1974, 1997, in which this formulation is based on the use of the known GIS algorithm and leads only to this one special solution but not to the other cases within the protective scope of the invention.
In one embodiment of the invention, a first training vocabulary corpus is used for estimating a first part of the M-gram probabilities, and a first part of a second training vocabulary corpus is used for estimating a second part of the M-gram probabilities, and a second part of the second training vocabulary corpus is used for determining the optimized parameter values assigned to the M-gram probabilities. In this way, vocabularies of different size can be integrated in the model formation, which, with different degrees, are adapted to special applications. For example, the first training vocabulary corpus is preferably an application-unspecific vocabulary corpus such as, for example the NAB corpus (North American Business News). The second vocabulary corpus preferably consists of vocabulary elements from one or more example texts about given special fields of application, for example, the judicial field. When the second training vocabulary corpus is chosen to be the considerably smaller corpus as compared with the first training vocabulary corpus, the linguistic speech model may be adapted to special applications with little effort. Also the parameter values used for model adaptation are determined by means of the second training vocabulary corpus so as to minimize the processing effort.
For determining the optimized parameter values, the optimizing function
F
⁡
(
{
λ
i
}
)
=
∑
hw
⁢
f
⁡
(
hw
)
⁢
log
⁢

⁢
(
1
Z
λ
⁡
(
h
)
⁢
∏
i
⁢
p
i
(
w
&RightBracketingBar;
⁢
h
)
λ
i
)
is minimized, wherein
&lgr;
i
represents the parameter values to be optimized,
hw represents M-grams for a vocabulary element w with a history h of previous vocabulary elements,
f(hw) represents the quotient with the number of counted M-grams occurring in a training phase of the second part of the second vocabulary as a dividend, and the number of vocabulary elements of the second vocabulary as a divisor,
1/Z
&lgr;
(h) represents a scaling factor, and
p
i
represents the estimated probability of occurrence of the vocabulary element w, given the history h. This optimizing function representing a probability function with the parameters &lgr;
i
as variables is convex and has a single maximum for a given set of parameter values &lgr;
i
which can be determined by means of conventional methods of approximation. An explicit determination of Kullback-Leibler distances is avoided in this way.
When only M-gram probabilities with M<3 are used in the formation of an improved linguistic speech model according to the invention, the required memory space for computers, with which the speech recognition is to be performed, may remain small. For model formation, unigrams, bigrams and particularly gap bigrams are used in this case.
The invention also relates to a speech recognition system using a speech vocabulary with vocabulary elements to which probabilities of occurrence are assigned, which are estimated by means of a method as described hereinbefore.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiment(s) described hereinafter.

REFERENCES:
patent: 4831550 (1989-05-01), Katz
patent: 5293584 (1994-03-01), Brown et al.
patent: 5467425 (1995-11-01), Lau et al.
patent: 5640487 (1997-06-01), Lau et al.
R. Kneser, V. Steinbiss, “On The Dynamic Adaptation of Stochastic Language Models”, Proc.ICASSP. pp. 586-589, 1993.
R. Kneser, J. Peters and D. Klakow, “Language Model Adaptation Using Dynamic Marginals”, Eurospeech, pp. 1971-1984, 1997.
“Numerical Recipes”, William H. Press et al, Cambridge University Press, 1989, Chapter 10.4.

Affiliated with

Klakow Dietrich

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Abebe Daniel

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Piotrowski Daniel J.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

U.S. Philips Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

{haeck over (S)}mits T{overscore (a)}livaldis Ivars

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of estimating probabilities of occurrence of speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of estimating probabilities of occurrence of speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of estimating probabilities of occurrence of speech... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2578212

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure