Document summarizing apparatus, document summarizing method...

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06493663

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a document summarizing apparatus, a document summarizing method and a recording medium storing a document summarizing program, more specifically to a document summarizing apparatus, a document summarizing method and a recording medium storing a document summarizing program for creating a summary holding the overview of a group of a plurality of documents.
2. Description of the Related Art
A variety of document summarizing technologies has been studied and some working technologies have been practically developed. However, the almost all of the document summarizing technologies of the related arts are targeted to one single document. In practice, there are needs for summarizing a plurality of documents for picking up the overview thereof. These methods developed for only summarizing one document are not applicable to a collection of documents and they result in an inappropriate summary.
Examples of popular methods in the related art include a method of picking up important sentences, and a method of abstracting. In the related art, based on the frequency of appearance of words, a location in a document or in a paragraph, usage of proper nouns and so on, a score is given for each sentence of the document Sentences with higher scores are picked up until the number of sentences or the whole length of summary becomes equal to a pre-selected value to enumerate them to create a summary. If such a method is applied to a plurality of documents, sentences that will be selected from one of documents in a group will represent a group of documents and may not be appropriate for a summary thereof.
There are needs for summarizing a plurality of documents. Summarizing technologies for a plurality of documents may include:
(1) Enumeration of Keywords
The keyword enumeration method enumerate the most frequent words appeared in a document cluster. One example is the classification technology documented in the paper of Cutting, et al., “Scatter/Gather: A cluster-based Approach to Broweing Large Document Collection”, SIGIR-92 (1992). Some inventions based on this method include the Japanese Published Unexamined Patent Application No. Hei 5-225256, and the U.S. patent application Ser. No. 5,442,778. A preselected number of keywords that appeared frequently in the group of documents will be enumerated.
(2) Generation of Sentences Based on the Extracted Meanings
A method of sentence-synthesis based on the extracted meanings is described in the paper of McKeown and Radev, “Generating Summaries of Multiple News Articles” SIGIR-95 (1995); one example thereof is SUMMONS (SUMMarizing Online NewS articles). This technology uses slots in a given template to be fulfilled with information extracted from a plurality of documents. The information embedded in the template will be used as the conceptual structure for generating a summary of a pattern matched with the syntax.
(3) Synthesis of Following-up Articles
The technology described in the paper by Funasaka, Yamamoto and Masuyama, “Summarizing relational news articles by reducing redundancy” Natural Language Processing, 114-7 (1996) generates a summary or a plurality of documents by reducing redundancy from between a plurality of following-up news articles and synthesizing them. The following-up news articles, in general, may contain some paragraphs describing the course of an event as the background. The description of the background will be redundant if there is an article on the background. Accordingly reducing the redundancy between articles and synthesizing them may generate a summary without redundancy.
(4) Synthesis of a Plurality of Sentences
In this method a summary will be synthesized by identifying the sentences sharing the same meaning from between articles of the same event (for example, news articles of a plurality of news companies describing the same affair).
The document summarizing apparatus disclosed in the Japanese Published Unexamined Patent Application No. Hei 10-134066 gathers similar paragraphs (of online news of other news companies) to a specified paragraph (of online news). The gathered paragraphs are then disassembled to sentences to regroup similar sentences Here the similar sentences may be defined to have the number of pattern-matched words greater than a threshold value. For example, “Typhoon #5, landing in Kyushu” or “a large typhoon #5 lands in Kyushu”, etc.
A representative sentence for each of these groups will be generated. The manners to generate a representative sentence may comprises, for example, selecting one therefrom, generating a common set of blocks, or generating a union set. The common set of the example above may be “Typhoon #5, landing in Kyushu” and the union set may be “a large typhoon #5 lands in Kyushu”.
A method disclosed in the paper by Shibata. et al., “Merging a Plurality of Documents”, Association of Natural Language Processing 120-2 (1997) also identifies a common sentence sharing the similar meanings from news articles of a plurality of news companies describing a same affair to synthesize a set therefrom. The manners of synthesis comprises an “AND” set (common set of elements), and an “OR” Set (union set of elements)
However, the technologies of the Prior Art suffers from the problems as follows:
(1) The enumeration of keywords cannot indicate the relational dependencies between words, since words are appeared independently. The reader has to guess the meaning behind them from the sequential order of keywords and from a variety of knowledge thereon. In order to guess what the collection of documents would say, the reader is required to have some knowledge on the field of the subject or the knowledge on the event described in the collected documents.
(2) The generation of sentences from the extracted meanings is definitively limited to a narrow class of documents to be processed. This method has the definitive paragraphs subjected, such as articles on an affair of terrorism (“who did attack what, where, when and how, the victims and demolished buildings are . . . ”). A meaning template for each kind of affairs should be predefined. This method may be used only for articles on the same affair. However, it may not be applicable to a collection of documents gathered as the result of search or of clustering.
(3) The synthesis of following-up articles deals with the parent article and following articles of the same affair. Therefore this method is not applicable to a group of documents gathered as the result of search or of clustering.
(4) The synthesis of a plurality of sentences is applicable only to the articles on the same affair. Therefore this method is not applicable to a group of documents gathered as the result of search or of clustering.
SUMMARY OF THE INVENTION
The present invention has been made in light of these problems, the present invention provides a document summarizing apparatus, which generates a comprehensive summary when processing a group of documents of relatively diverse contents.
Also, the present invention provides a document summarizing method, which in applicable to a group of documents of relatively diverse contents for generating a comprehensive summary therefrom.
In addition, the present invention provides a computer-readable recording medium carrying a document summarizing program, which may be used with a computer to generate a comprehensive summary about a group of documents of relatively diverse contents.
In order to solve the problems as described above, a document summarizing apparatus according to the present invention for generating a summary of a set of documents, comprises: a sentence analyzing unit that analyzes the syntax (structure) of sentences contained in the documents specified to be processed to generate an analysis graph describing the relational dependencies between words; an analysis graph scoring unit that scores the analysis graph generated by the sentence analyzing unit based on importance; an analysis graph score accumulating unit that stores th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Document summarizing apparatus, document summarizing method... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Document summarizing apparatus, document summarizing method..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Document summarizing apparatus, document summarizing method... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2995386

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.