Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Patent
1997-05-20
1999-10-05
Trammell, James P.
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
704 2, 704 4, 704 8, 704257, 707532, G06F 1727, G06F 1728
Patent
active
059638940
ABSTRACT:
A method and system for bootstrapping statistical processing into a rule-based natural language parser is provided. In a preferred embodiment, a statistical bootstrapping software facility optimizes the operation of a robust natural language parser that uses a set of lexicon entries to determine possible parts of speech of words from an input string and a set of rules to combine words from the input string into syntactic structures. The facility first operates the parser in a statistics compilation mode, in which, for each of many sample input strings, the parser attempts to apply all applicable rules and lexicon entries. While the parser is operating in the statistics compilation mode, the facility compiles statistics indicating the likelihood of success of each rule and lexicon entry, based on the success of each rule and lexicon entry when applied in the statistics compilation mode. After a sufficient body of likelihood of success statistics have been compiled, the facility operates the parser in an efficient parsing mode, in which the facility uses the compiled statistics to optimize the operation of the parser. In order to parse an input string in the efficient parsing mode, the facility causes the parser to apply applicable rules and lexicon entries in the descending order of the likelihood of their success as indicated by the statistics compiled in the statistics compilation mode.
REFERENCES:
patent: 4829423 (1989-05-01), Tennant et al.
patent: 4887212 (1989-12-01), Zarnora et al.
patent: 5146406 (1992-09-01), Jensen
patent: 5297040 (1994-03-01), Hu
patent: 5418717 (1995-05-01), Su et al.
patent: 5419413 (1995-05-01), Kutsumi et al.
patent: 5495413 (1996-02-01), Kutsumi et al.
patent: 5555169 (1996-09-01), Namba et al.
patent: 5752052 (1998-05-01), Richardson et al.
Weischedel et al. "Coping with Ambiguity and Unknown Words through Probabilistic Models", Computational Linguistics, vol. 19, No. 2, pp. 359-382, Jun. 1993.
Chang et al., "Why Corpus-Based Statistics-Oriented Machine Translation", Proc. of the 4th Inter. Conf. on Theoretical & Methodological Issues in Machine Translation, pp. 249-262, Jun. 25, 1992.
Black et al., "Towards History-Based Grammars: Using Richer Models for Probabilistic Parsing," in Proceedings of the 31.sup.st Annual Meeting of the Association for a Computational Linguistics, Association for Computational Linguistics, 1993, pp. 31-37.
Briscoe, T. and J. Carroll, "Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars," Computational Linguistics, 19(1):25-59, 1993.
Briscoe, T. and N. Waegner, "Robust Stochastic Parsing Using the Inside-Outside Algorithm," in Proceedings of AAAI92 Workshop on Probabilistically-Based Natural Language Processing Techniques, San Jose, CA, 1992, pp. 39-53.
Lari, K. and S. J. Young, "Applications of Stochastic Context-Free Grammars Using the Inside-Outside Algorithm," Computer Speech and Language, 5:237-257, 1991.
Lari, K. and S. J. Young, "The Estimation of Stochastic Context-Free Grammars Using the Inside-Outside Algorithm," Computer Speech and Language, 4:35-56, 1990.
Pereira, F. and Y. Schabes, "Inside-Outside Reestimation from Partially Bracketed Corpora," Association for Computational Linguistics, 1992, pp. 128-135.
Schabes et al., "Parsing the Wall Street Journal with the Inside-Outside Algorithm," EAC, 1993, pp. 341-347.
Su, Keh-Yih and Jing Shin Chang, "Why Corpus-Based Statistics-Oriented Machine Translation," Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Empiricist vs. Rationalist Methods in MT. Jun. 1992, pp. 249-262.
Weischedel, Ralph et al., "Coping with Ambiguity and Unknown Words through Probabilistic Models," Computational Linguistics, vol. 19, No. 2, Jun. 1993, pp. 359-352.
Heidorn George E.
Richardson Stephen Darrow
Microsoft Corporation
Nguyen Cuong H.
Trammell James P.
LandOfFree
Method and system for bootstrapping statistical processing into does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for bootstrapping statistical processing into , we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for bootstrapping statistical processing into will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1183202