Data processing: structural design – modeling – simulation – and em – Modeling by mathematical expression
Reexamination Certificate
2000-01-21
2004-02-24
Broda, Samuel (Department: 2123)
Data processing: structural design, modeling, simulation, and em
Modeling by mathematical expression
C704S002000, C704S003000, C706S019000, C706S020000, C706S055000
Reexamination Certificate
active
06697769
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to machine learning. In particular, the present invention relates to reducing training time for machine learning.
Machine learning is a general term that describes automatically setting the parameters of a system so that the system operates better. One common use for machine learning is the training of parameters for a system that predicts the behavior of objects or the relationship between objects. An example of such a system is a language model used to predict the likelihood of a sequence of words in a language.
One problem with current machine learning is that it can require a great deal of time to train a single system. For example, it can take up to three weeks to train some language models. This problem is especially acute in systems that have a large number of parameters that need to be trained and a large number of outputs, and whose training time is proportional to the number of parameters times the number of outputs of the system. In particular, systems that utilize Maximum Entropy techniques to describe the probability of some event tend to have long training times.
Thus, a method is needed that accelerates training time for systems that have a large number of outputs.
SUMMARY OF THE INVENTION
A method and apparatus are provided that reduce the training time associated with machine learning systems whose training time is proportional to the number of outputs being trained. Under embodiments of the invention, the number of outputs to be trained is reduced by dividing the objects to be modeled into classes. This produces at least two sets of model parameters. At least one set describes some aspect of the classes given some context, and at least one other set of parameters describes some aspect of the objects given a class and the context. Thus, instead of training a system with a very large number of outputs, the present invention trains at least two smaller systems, at least one for the classes and one for the objects within a class. Since the number of outputs of each of the new systems may be as small as the square-root of the number of outputs of the original system, the resulting training times may be considerably reduced.
Under many embodiments of the invention, maximum entropy models are trained by training a first set of maximum entropy weighting values and a second set of maximum entropy weighting values. The first set of weighting values is used to determine the probability of a class given a preceding sequence of words. The second set of weighting values is used to determine the probability of a next word given a class and a preceding sequence of words.
REFERENCES:
patent: 5267345 (1993-11-01), Brown et al.
patent: 5568591 (1996-10-01), Minot et al.
patent: 5822741 (1998-10-01), Fischthal
patent: 5839105 (1998-11-01), Ostendorf et al.
patent: 5917498 (1999-06-01), Korenshtein
patent: 5982934 (1999-11-01), Villalba
patent: 6018728 (2000-01-01), Spence et al.
patent: 6128613 (2000-10-01), Wong et al.
patent: 6192360 (2001-02-01), Dumais et al.
patent: 6269334 (2001-07-01), Basu et al.
patent: 6304841 (2001-10-01), Berger et al.
Baker and McCallum, “Distribution Clustering of Words for Text Classification,” ACM, 1998, pp. 96-103.*
“A Maximum Entropy Approach to Natural Language Processing”, A. Berger, et al.,Association for Computational Linguistics, vol. 22, No. 1, pp. 1-36 (1996).
“Generalized Iterative Scaling for Log-Linear Models”, J. Darroch, et al.,The Annuals of Mathematical Statistics, vol. 43, No. 5, pp. 1470-1480 (1972).
“Adaptive Statistical Language Modeling: A Maximum Entropy Approach”, R. Rosenfield, CMU-CS-94-138, pp. 1-104 (Apr. 19, 1994).
“Evaluation of a Language Model Using a Clustered Model Backoff”,J. Miller, et al., Proceedings ICSL 96 (1996), pp. 390-393.
On Structuring Probabilistic Dependences in Stochastic Language Modeling:, H. Ney, et al.,Computer Speech and Language, vol. 8, pp. 1-38 (1994).
“Class-Based n-gram Models of Natural Language”, P. Brown, et al.,Association for Computational Linguistics, vol. 18, No. 4, pp. 467-479 (1992).
“Inducing Features of Random Fields”, S. Pietra, et al.,IEEE Transactions Pattern Analysis and Machine Intelligence, vol. 19, No. 4, pp. 1-12 (Apr. 1997).
“Using Maximum Entropy for Text Classification”, IJCAI '99 Workshop on Information Filtering, K. Nigam, et al. (1999).
Goodman Joshua
Moore Robert
Broda Samuel
Magee Theodore M.
Microsoft Corporation
Phan Thai
Westman Champlin & Kelly P.A.
LandOfFree
Method and apparatus for fast machine training does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for fast machine training, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for fast machine training will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3348695