Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Reexamination Certificate
1999-12-15
2004-07-20
Edouard, Patrick N. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
C715S252000, C715S252000
Reexamination Certificate
active
06766287
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to the field of document management, and more particularly, to a system for summarizing documents that uses information about a document's genre, or document type, for selecting summary sentences for an automatically generated summary.
BACKGROUND
A user faced with a huge document or a collection of documents typically wants to obtain a summary of the documents in order to save time or to answer a specific question. The task of summarizing a document involves finding a small number of sentences that provide a concise characterization of the document. Existing approaches for summarizing documents apply only one summarization strategy, thus ignoring variations in the structure and wording of different genres of documents. Some examples of different document genres include newspaper articles, editorials, reference manuals, scientific works and tutorials. One problem with existing approaches is they can be slow and inaccurate when applied to heterogeneous document collections. A heterogeneous document collection includes documents of different genres, or document types such as fiction, scientific or other non-fiction works, etc.
SUMMARY
The present invention provides a system for genre-specific summarization of documents. The system of the present invention overcomes the problem of summarizing heterogeneous document collections by taking the genre, or type, of document into account when selecting summary sentences. We have discovered that one problem with applying known document summarization techniques to heterogeneous collections is that the assumptions made by such techniques may not apply across the population of the collection. Such assumptions include where in a document sentences which contain summary information might be located, keywords which may indicate summary information, etc. By taking genre into account, the system of the present invention takes advantage of the structure and wording of various document genres to provide faster and more accurate summaries. For example, document genres such as newspaper articles tend to have good summary sentences in the beginning and document genres such as research papers tend to have good summary sentences in the conclusion. The system of the present invention takes this information into account when selecting summary sentences.
REFERENCES:
patent: 5689716 (1997-11-01), Chen
patent: 5745602 (1998-04-01), Chen et al.
patent: 5778397 (1998-07-01), Kupiec et al.
patent: 5838323 (1998-11-01), Rose et al.
patent: 5848191 (1998-12-01), Chen et al.
patent: 5918240 (1999-06-01), Kupiec et al.
patent: 6349316 (2002-02-01), Fein et al.
Goldstein et al. ′ “Summarizing Text Document :Sentence Selection and Evaluation Metrics” pp. 1-8, 1999.*
Kessler et al., Automatic Detection of Text Genre,Proceedings of ACL 35 and EACL 8, Morgan Kaufmann Publishers, San Francisco, California, 1997, pp. 32-38.
Kupiec et al., A Trainable Document Summarizer,Proceedings of the 18thAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval,Seattle, Washington, Jul. 9-13, 1995, pp. 68-73.
Kupiec Julian M.
Schuetze Hinrich
Edouard Patrick N.
Xerox Corporation
LandOfFree
System for genre-specific summarization of documents does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System for genre-specific summarization of documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for genre-specific summarization of documents will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3256893