Patent
1994-09-16
1997-08-19
Hayes, Gail O.
395761, G06F 1728, G06F 1730
Patent
active
056597665
ABSTRACT:
An iterative method of determining the topical content of a document using a computer. The processing unit of the computer determines the topical content of documents presented to it in machine readable form using information stored in computer memory. That information includes word-clusters, a lexicon, and association strength values. The processing unit beings by generating an observed feature vector for the document being characterized, which indicates which of the words of the lexicon appear in the document. Afterward, the processing unit makes an initial prediction of the topical content of the document in the form of a topic belief vector. The processing unit uses the topic belief vector and the association strength values to predict which words of the lexicon should appear in the document. This prediction is represented via a predicted feature vector. The predicted feature vector is then compared to the observed feature vector to measure how well the topic belief vector models the topical content of the document. If the topic belief vector adequately model the topical content of the document, then the processing unit's task is complete. On the other hand, if the topic belief vector does not adequately model the topical content of the document, then the processing unit determines how the topic belief vector should be modified to improve the prediction of modeling of the topical content.
REFERENCES:
patent: 4839853 (1989-06-01), Deerwester et al.
patent: 5301109 (1994-04-01), Landauer et al.
patent: 5317507 (1994-05-01), Gallant
patent: 5371807 (1994-12-01), Register et al.
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science. Sep. 1990; pp. 391-407.
DeJong, G.F., Skimming Newspaper Stories by Computer. Research Report #104, May 1977; pp. 1-31.
Dumais, S.T., Nielsen, J. Automating the Assignment of Submitted Manuscripts to Reviewers. Belkin, Ingwersen, and Pejtersen, Editors. Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; Jun. 1992; Copenhagen, Denmark. pp. 233-244.
Hearst, Marti A., Plaunt, Christian. Subtopic Structuring for Full-Length Document Access. Korfhage, Rasmussen, Willett, Editors. Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; Jun. 27-Jul. 1, 1993; Pittsburgh, PA; pp. 59-68.
Jacobs, P.S., Rau, L.F. SCISOR: Extracting Information from On-line News. Sibley, E.H., Editor, Communications of the ACM, Nov. 1990, vol. 33, No. 11; pp. 88-97.
Masand, B., Linoff, G., Waltz, D. Classifying News Stories using Memory Based Reasoning. Belkin, Ingwersen, and Pejtersen, Editors. Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; Jun. 1992; Copenhagen, Denmark. pp. 59-65.
McCune, P., Tong, R.M., Dean, J.S, Shapiro, D.G. RUBRIC: A System for Rule-Based Information Retrieval. IEEE Transactions on Software Engineering, vol. SE-11, No. 9, Sep. 1985; pp. 939-945.
Parabolic Interpolation and Brent's Method in One-Dimension. Numerical Recipes in C, The Art of Scientific Computing. Cambridge University Press; pp. 299-302, 315-322.
Phillip J. Hayes. Intelligent high-volume text precessing using shallow, domain-specific techniques. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, pp. 227-242. Lawrence Erlbaum Associates, 1992.
Riloff, E., Lehnert, W. Classifying Texts Using Relevancy Signatures. Natural Language:Parsing; pp. 329-334.
Saund, Eric. Unsupervised Learning of Mixtures of Multiple Causes in Binary Data. Spatz, Bruce M., ed., Neural Information Processing Systems 6. San Francisco, CA: Morgan Kaufmann Publishers, Inc.; 1994; pp. 27-34; 1365-1383.
Stanfill, C., Waltz, D. Toward Memory-Based Reasoning. Communications of the ACM, Dec. 1986, vol. 29, No. 12; pp. 1213-1228.
Willett, P. Recent Trends in Hierarchic Document Clustering: A Critical Review. Information Processing & Management, vol. 24, No. 5, 1988; pp. 577-597.
Hearst Marti A.
Saund Eric
Hayes Gail O.
Hurt Tracy L.
Xerox Corporation
Yount Steven R.
LandOfFree
Method and apparatus for inferring the topical content of a docu does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for inferring the topical content of a docu, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for inferring the topical content of a docu will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1113105