Cluster and pruning-based language model compression

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S257000, C704S001000

Reexamination Certificate

active

06782357

ABSTRACT:

FIELD OF THE INVENTION
The invention relates generally to statistical language models, and more particularly to compression of such models (e.g., n-gram language models).
BACKGROUND OF THE INVENTION
A common application today is the entering, editing and manipulation of text. Application programs that perform such text operation include word processors, text editors, and even spreadsheets and presentation programs. For example, a word processor allows a user to enter text to prepare documents such as letters, reports, memos, etc. While the keyboard has historically been the standard input device by which text input is performed into these type of application programs, it is currently being augmented and/or replaced by other types of input devices. For example, touch-sensitive pads can be “written” on with a stylus, such that a handwriting recognition program can be used to input the resulting characters into a program. As another example, voice-recognition programs, which work in conjunction with microphones attached to computers, also are becoming more popular. Especially for non-English language users, and particularly for Asian language users, these non-keyboard type devices are popular for initially inputting text into programs, such that they can then be edited by the same device, or other devices like the keyboard. Speech and handwriting recognition have applications beyond text entry as well.
A primary part of the use of handwriting or speech recognition is the selection of a language model that is used to determine what a user writes or speaks should be translated to. In general, the more sophisticated a language model is, the more space it needs for storage. This is unfortunate especially in situations where storage space is at a premium, such as in handheld- and palm-oriented computing devices. Therefore, the compression of such models is typically necessary. The performance, or measure of accuracy, of a language model is determined typically based on what is known in the art as the perplexity of the model. Prior art language model compression techniques, while reducing the size of the resulting compressed model, also disadvantageously increase the perplexity, and hence reduce the accuracy, of the model. Such compression techniques that result in a reduced-size and increased-perplexity language model include only pruning the language model, and using what is referred to as “classical” clustering that by virtue of the clustering itself reduces the size of the model, but which increases the perplexity of the model.
Therefore, there is a need within the prior art for compressing language models that result in smaller-sized models, but with as limited an increase in perplexity as possible. For this and other reasons, therefore, there is a need for the present invention.
SUMMARY OF THE INVENTION
The invention relates to the cluster- and pruning-based compression of language models. In one embodiment, words are first clustered, such that the resulting language model after clustering has a larger size than it did before clustering. Clustering techniques amenable to the invention include but are not limited to predictive clustering and conditional clustering. The language model, as clustered, is then pruned. Pruning techniques amenable to the invention include but are not limited to entropy-based techniques, such as Stolcke pruning, as well as count-cutoff techniques and Rosenfeld pruning. In one particular embodiment, a word language model is first predictively clustered, using a novel predictive clustering technique, and then is pruned utilizing Stolcke pruning.
Embodiments of the invention provide for advantages not found within the prior art. Unintuitively and nonobviously, embodiments of the invention initially cluster a language model such that it has a larger size than it did before being clustering. The subsequent pruning of the model then results in a compressed language model that has a smaller size for a given perplexity level as compared to prior art language model compression techniques. Embodiments of the invention also result in a compressed language model that has lower perplexity for a given size of model as compared to prior art language model compression techniques.
The invention includes computer-implemented methods, machine-readable media, computerized systems, and computers of varying scopes. Other aspects, embodiments and advantages of the invention, beyond those described here, will become apparent by reading the detailed description and with reference to the drawings.


REFERENCES:
patent: 5835893 (1998-11-01), Ushioda
patent: 6314339 (2001-11-01), Rastegar et al.
patent: 6317707 (2001-11-01), Bangalore et al.
patent: 6415248 (2002-07-01), Bangalore et al.
H Yamamoto, Y Sagisaka, Multi-class Composite N-gram based on Connection Direction, in Proceedings of the IEEE Int'l Conf on Acoustics, Speech and Signal Processing, May 1999, Phoenix, AZ.
K. Seymore, R. Rosenfeld, Scalable backoff language models, in Proc ICSLP, vol. 1, pp 232-235, Philadelphia 1996.
C. Samuelsson, W. Reichl, A Class-based Language Model for Large-vocabulary Speech Recognition Extracted from Part-of-Speech Statistics, vol 1, paper No. 1781, ICASSP 1999.
R. Kneser, Statistical language modeling using a variable context length, Proc. ICSLP '96, Philadelphia, PA, Oct. 1996, vol 1, pp. 494-497.
K. Ries et al, Class phsrae models for language modeling, Proc. ICSLP'96, Philadelphia, PA, Oct. 1996, vol 1.
I. Guyon, F. Pereira, Design of a linguistic postprocessor using variable memory length Markov models, In International Conference on Document Analysis and Recognition, pp. 454-457, Montreal, Canada, IEEE Computer Society Press. 1995.
M. Kearns, Y. Mansour, A. Ng, An information-theoretic Analysis of hard and soft assignment methods for clustering, Proceedings of the 13thConf on Uncertainty in AI, 1997.
B. Suhm, A. Waibel, Towards better language models for spontaneous speech, Proceedings of ICSLP, 1994.
S. Bai et al, Building class-based language models with contextual statistics, Proceedings of ICASSP, 1998.
Melia, Heckerman, An experimental comparison of several clustering and initialization methods, Proceedings of the 14thConf on uncertainty in AI, 1998.
J. Bellegarda et al, A novel word clustering algorithm based on latent semantic analysis, Proceedings of ICASSP, 1996, vol. 1.
Niesler et al, Comparison of part-of-speech and automatically derived category-based language models for speech recognition, Proceedings of ICASSP, 1998.
Miller, Alleva, Evaluation of a language model using a clustered model backoff, Proceedings of ICSLP, 1996, vol. 1.
Bahl, Brown, et al, A tree-based statistical language model for natural language speech recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, No. 7, 1989.
Willems et al, Reflections on “The Context-tree weighting Method: Basic properties,” theIEEE Transactions on Information Theory, vol. IT-41, No. 3, May 1995.
Jardino, Multilingual Stochastic N-gram class language models, Proceedings of ICASSP, 1996.
Ward, Issar, A class based language model for speech recognition, Proceedings of ICASSP, vol. 1, 1996.
Blasig, Combination of words and word categories in varigram histories, Proceedings of ICASSP, 1999.
L Lee, Measures of distributional similarity, Proceedings of the 37thannual meeting of the Assn for Computational Linguistics, (Conference), Jun. 20-26, 1999.
Ueberla, More efficient clustering of n-grams for statistical language modeling, Eurospeech 1996, pp. 1257-1260.
Chen, Goodman, An empirical study of smoothing techniques for language model, TR-10-98, Computer science Group, Harvard University, 1998.
Ney et al, On structuring probabilistic dependencies in stochastic language modeling, Computer Speech and Language 1994 (8), 1-38.
Stolcke, Entropy-based pruning of backoff language models, in proceedings ICSLP, vol 1, pp 232-235, Philadelphia, 1996.
Brown, Della Pietra, deSouza, et al, Class-based n-gram models of natural language, Computational Linguistics 1990 (18), 467-479.
R. Kneser, H. Ney, Im

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cluster and pruning-based language model compression does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cluster and pruning-based language model compression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cluster and pruning-based language model compression will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3320201

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.