Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique
Reexamination Certificate
2006-10-31
2006-10-31
Hirl, Joseph P. (Department: 2129)
Data processing: artificial intelligence
Knowledge processing system
Knowledge representation and reasoning technique
C706S046000, C706S045000
Reexamination Certificate
active
07130837
ABSTRACT:
Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to “topics”, latent variables in the PLSA model, and “topics” to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.
REFERENCES:
patent: 5606643 (1997-02-01), Balasubramanian et al.
patent: 5659766 (1997-08-01), Saund et al.
patent: 5675819 (1997-10-01), Schuetze
patent: 5687364 (1997-11-01), Saund et al.
patent: 5943669 (1999-08-01), Numata
patent: 6128634 (2000-10-01), Golovchinsky et al.
patent: 6239801 (2001-05-01), Chiu et al.
Thomas Hofmann, Probabilistic Latent Semantic Indexing, Aug. 1999, ACM, 1-58113-096-1/99/0007, 50-57.
Mike Dowman, Content Augmentation for Mixed-Mode News Broadcasts, University of Sheffield.
Freddy Y. Y. Choi, Latent Semantic Analysis for Text Segmentation, Jun. 2001, University of Manchester.
A. P. Dempster, Maximum Likelihood from Incomplete Data via the EM Algorithm, 1976, Harvard University, 1-38.
Thorsteb Brants, Segmentation and Identification fo Document Topics for Creating Document Structure, 2002, PARC, 1-8.
Brants et al., “Segmentation and Identification of Document Topics for Creating Document Structure”, PARC, pp. 1-8, 2002.
Lee, “Measures of Distributional Similarity”, Proceedings of the 37thACL, pp. 1-8, 1999.
Blei et al., “Latent Dirichlet Allocation”, University of California, pp. 1-8.
Dempster et al., “Maximum Likelihood from Incomplete Data via theEMAlgorithm”, Royal Statistical Society, pp. 1-38, 1976.
Choi, “Advances in domain independent linear text segmentation”, University of Manchester, pp. 1-8.
Hofmann, “Probabilistic Latent Semantic Indexing”, EECS Department, pp. 50-57, 1999.
Hearst, “TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages”, Computational Linguistics, vol. 23, No. 1, pp. 33-64, 1997.
Li et al., “Topic Analysis Using a Finite Mixture Model”, C&C Media Res. Labs., pp. 1-21, 2000.
Li et al., “Topic Analysis Using a Finite Mixture Model”, NEC Corporation, pp. 1-10.
Beeferman et al., “Statistical Models for Text Segmentation, School of Computer Science”, Carnegie Mellon University, pp. 1-37.
Li et al., “Topic Analysis Using a Finite Mixture Model,” NEC Corporation, Oct. 2000, pp. 35-44.
Franz et al., “Segmentation and Detection at IBM: Hybrid Statistical Models and Two-tiered Clustering,” IBM T.J. Watson Research Center, Feb. 2000, pp. 1-5.
Dharanipragada et al., “Story Segmentation and Topic Detection in the Broadcast News Domain,” IBM T.J. Watson Research Center, Feb. 1999, pp. 65-68.
Brants Thorsten H.
Chen Francine R.
Tsochantaridis Ioannis
LandOfFree
Systems and methods for determining the topic structure of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Systems and methods for determining the topic structure of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Systems and methods for determining the topic structure of a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3618724