Patent
1994-06-08
1997-01-14
McElheny, Jr., Donald E.
395611, 395793, G06F 1730
Patent
active
055946410
ABSTRACT:
The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The techniques described apply generally across the languages of the world and are not just limited to simple suffixing languages like English. Although the resulting transducers can have many states and transitions or arcs, they can be compacted by finite-state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel finite state transducer as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database.
REFERENCES:
patent: 5051886 (1991-09-01), Kawaguchi et al.
patent: 5323316 (1994-06-01), Kadashevich et al.
Lauri Karttunen, "Finite-State Constraints" International Conference on Current Issues in Comp. Linguistics Jun. 10-14, 1991.
"A Compiler for Two Level Phonological Rules" Karttunen et al. Xerox Palo Alto Research Center, Jun. 1987.
"Nonconcatenative Finite-State Morphology" by Marlon Kay, Xerox Palo Alto Research Center.
"Performance and Architectural Issues for String Matching" Isenman, M., et al., IEEE Transactions on Computers, vol. 39, No. 2, Feb. 1990, New York, U.S.A.
"State Machines Find the Pattern", Kimbrell R. E., Computer Design, vol. 24, No. 5, May 1985, Littleton, Massachusetts, U.S.
Introduction to Automata Theory, Languages and Computation, Holcraft and Ullman, 1979, Addison-Wesley Publishing Co., pp. 64-76.
"Development of a Stemming Algorithm", J. B. Lovins, Mechanical Translation And Computational Linguistics, 11, pp. 22-31, Mar. 1968.
"Finite-state Constraints" by Lauri Karttunen, International Conference on Current Issues in Computational Linguistics. Jun. 10-14, 1991. Universiti Sains Malaysia, Penang, Malaysia. To appear in The Last Phonological Rule: Reflections on Constraints and Derivations, ed. by John Goldsmith, University of Chicago Press.
Kaplan, R. M. and M. Kay. Phonological rules and finite-state transducers [Abstract]. Linguistic Society of American Meeting Handbook. Fifty-sixth Annual Meeting, Dec. 27-30, 1981. New York.
Koskenniemi, K. Two-Level Morphology. A General Computational Model for Word-Form Recognition and Production. Department of General Linguistics. University of Helsinki. 1983.
Karttunen, L., K. Koskenniemi, and R. M. Kaplan. A Compiler for Two-level Phonological Rules. In Dalrymple, M. et al. Tools for Morphological Analysis. Center for the Study of Language and Information. Stanford University. Palo Alto. 1987.
Kay, Martin. Nonconcatenative Finite State Morphology. Proceedings of the 3rd Conference of the European Chapter of the Association for Computational Linguistics. Copenhagen 1987.
Ashdown "Minimizing Finite State Machines", Embedded Systems Programming, Premier 1988, pp. 57-66.
"An Algorithm For Suffix Stripping", M. F. Porter; Prog. 14, No. 3, pp. 130-137, Jul. 1980.
"The Theory of Machinery Computation", K. Kohavi, Ed., pp. 189-196, Academic Press, NY 1971.
Aho and Ullman "Principles of Compiler Design", Addison-Wesley, 1977, pp. 99-103, 114-117.
Tzoukermann, E. and M. Y. Liberman. M. A Finite-State Mophological Processor for Spanish. Proceedings of the 13th International Conference on Computational Linguistics. vol. 3. 277-282. University of Helsinki. Helsinki. 1990.
Cutting, D., J. Kupiec, J. Pedersen, P. Sibun. A Practical Part-of-Speech Tagger. Proceedings of the Third Conference on Applied Natural Language Processing. Trento, Italy, Apr. 1992.
Kaplan Ronald M.
Karttunen Lauri
McElheny Jr. Donald E.
Verdun Hayward A.
Xerox Corporation
LandOfFree
Finite-state transduction of related word forms for text indexin does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Finite-state transduction of related word forms for text indexin, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Finite-state transduction of related word forms for text indexin will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1393232