Representing n-gram language models for compact storage and...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000, C704S240000, C704S244000

Reexamination Certificate

active

07877258

ABSTRACT:
Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.

REFERENCES:
patent: 6363348 (2002-03-01), Besling et al.
patent: 7231349 (2007-06-01), Li et al.
patent: 7275029 (2007-09-01), Gao et al.
patent: 7363225 (2008-04-01), Church et al.
patent: 2007/0078653 (2007-04-01), Olsen
Raj, B.; Whittaker, E.W.D.; “Lossless compression of language model structure and word identifiers,” Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on, vol: 1 Publication Year: 2003 , pp. I-388-I-391 vol. 1.
Genqing Wu and Fang Zheng, “Reducing Language model size by Importance-based pruning and rank-Based Quantization,” Oriental-COCOSDA, pp. 156-159, Oct. 1-3, Sentosa, Singapore.
Genqing Wu and Fang Zheng, “Reducing Language model size by Importance-based pruning and rank-Based Quantization,” Oriental-COCOSDA, pp. 156-159, Oct. 1-3, Sentosa, Singapore 2003.
SRI International, “SRILM—The SRI Language Modeling Toolkit”, 2006, Menlo Park, CA (2 pages).
Clarkson, et al. “Statistical Language Modeling Toolkit”, Jun. 1999, Cambridge (5 pages).
Ghemawat, et al., “The Google File System”, Oct. 2003, New York (15 pages).
Dean, et al., “MapReduce: Simplified Data Processing on Large Clusters”, OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004, San Francisco, CA. (13 pages).
P. Clarkson and R. Rosenfeld, “Statistical language modeling using the CMU-Cambridge toolkit,” in Fifth European Conference on Speech Communication and Technology. ISCA, 1997, 4 pages.
B. Hsu and J. Glass, “Iterative language model estimation: efficient data structure & algorithms,” in Proc. Interspeech, Brisbane, Australia: ISCA, Sep. 22-26, 2008, pp. 841-844.
R. Rosenfeld, “The CMU statistical language modeling toolkit and its use in the 1994 ARPA CSR evaluation,” in Proceedings of the Spoken Language Systems Technology Workshop, Jan. 22-25, 1995, pp. 47-50.
K. Seymore and R. Rosenfeld, “Scalable backoff language models,” in Proceedings ICSLP 96, vol. 1, Philadelphia, PA, Oct. 3-6, 1996, pp. 232-235.
A. Stolcke, “Entropy-based pruning of backoff language models,” in Proceedings of News Transcription and Understanding Workshop, Lansdowne, VA: DARPA, Feb. 8-11, 1998, pp. 270-274.
A. Stolcke, “SRLIM—an extensible language modeling toolkit,” in Proceedings of the International Conference on Spoken Language Processing, Denver, CO, Sep. 2002, pp. 901-904.
D. Talbot and M. Osborne, “Smoothed bloom filter language models: Tera-scale LMs on the cheap,” in Proceedings of the 2007 Joint Conference on EMNLP and CoNLL, Prague, Czech Republic, Jun. 2007, pp. 468-476.
D. Talbot and T. Brants, “Randomized language models via perfect hash functions,” in Proceedings of ACL-08: HLT, Columbus, OH: Association for Computational Linguistics, Jun. 2008, pp. 505-513.
E. Whittaker and B. Raj, “Quantization-based language model compression,” Mitsubishi Electric Research Laboratories, Tech. Rep. TR-2001-41, Dec. 2001, 6 pages.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Representing n-gram language models for compact storage and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Representing n-gram language models for compact storage and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Representing n-gram language models for compact storage and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2701654

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.