Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Reexamination Certificate
1999-03-18
2002-04-16
Edouard, Patrick N. (Department: 2644)
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
C707S793000
Reexamination Certificate
active
06374209
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a text structure analyzing apparatus analyzing structure of a text described in natural language and an abstracting apparatus generating an abstract by selecting important elements from the text.
In recent years, with a rapid and wide use of electronic text, the necessity of technique of processing a text, namely, analyzing the structure of the text and selecting important sentences therefrom is increasingly required. In order to generate an abstract by selecting important sentences from the text, it is indispensable to analyze the structure of the text and evaluate the importance degree of each sentence constituting the text.
There is conventionally provided an automatic abstracting method disclosed in Japanese Laid-Open Patent Publication No. 2-112069 to evaluate an importance degree of each sentence by analyzing the structure of text and generate an abstract from an evaluated result thereof. The automatic abstracting method is as follows.
Of precedent sentences S including a key word whose character string is coincident with a character string of a key word included in sentences S
j
constituting a text, a sentence closest to the sentence S
j
is set as a parent sentence thereof. This operation allows the structure of the text to be expressed in a tree structure.
In the tree structure obtained by the operation, sentences included in a path between a head sentence (base node of tree structure) of the text and a last sentence of the text are regarded as important sentences. The chain of the important sentences are set as an abstract sentence.
However, the automatic abstracting method has the following problem:
(1) Merely the coincidence between the character strings of both key words is not enough to fully catch the connection between two sentences. In particular, when a text is constituted of a plurality of sub-topics, this tendency is conspicuous. That is, for example, when topics are switched from one to another, key words different from key words which have been on sentences appear many times.
(2) In determining the parent sentence of a sentence S, comparison between candidate sentences of the parent sentence is not made sufficiently in determining which of the sentences is best as the parent sentence. Thus, the conventional method is incapable of analyzing the structure of the text with high accuracy.
(3) The path between the head sentence of the text and the last sentence thereof may be comparatively long. Accordingly, when the sentence included in the path is selected, it is impossible to generate an abstract sufficiently concise.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a text structure analyzing apparatus analyzing structure of a text with high accuracy and an abstracting apparatus capable of obtaining an abstract highly accurate and concise.
In order to achieve the object, the present invention provides a text structure analyzing apparatus analyzing a connection between respective elements constituting a text and based on an analyzed result, indicating a structure of the text by means of a tree structure which represents the respective elements as nodes, comprising:
an element appearance position storing section dividing an inputted text into the elements and storing an appearance position relationship among the elements on the inputted text;
a relation degree computing section determining a precedent element of an attention element with reference to the appearance position relationship and computing a relation degree representing strength of a connection between the attention element and each precedent element;
an importance degree computing section computing an importance degree of the attention element, based on a relation degree between the attention element and each precedent element and an importance degree of a head element of the inputted text;
a structure determining section determining a tree structure of the inputted text by determining the precedent element having an optimum value as an importance degree of the attention element as a parent element of the attention element; and
an output section outputting the determined tree structure of the inputted text.
According to the construction, the parent element of each element in the tree structure of the inputted text is determined in consideration of the relation degree representing the strength of connection between the attention element and each precedent element and the importance degree of each element based on the relation degree. Thus, candidates of the parent element are compared with each other in much consideration of the connection between the two elements. Accordingly, it is possible to analyze the structure of the inputted text with high accuracy by setting only the element having a high degree of relation with the attention element as the parent element.
In an embodiment, the element is a sentence.
According to the construction, comparison between candidates of the parent sentence can be made in much consideration of the connection between two sentences. Thus, it is possible to analyze the structure of an inputted text with high accuracy by setting only a sentence having a high degree of relation with the attention element as the parent element.
An embodiment further comprises an important word recognizing section recognizing important words from words constituting the respective elements;
and important word weighting section weighting each of the recognized important words,
wherein the relation degree computing section has an important word comparing part for comparing a character string of an original form of each of the important words in the attention element with a character string of an original form of each of the important words in the precedent element to compute a relation degree between the attention element and the precedent element, based on a total value of weights of all the important words common to the attention element and to the precedent element and a number of all the important words in the attention element or a number of all the important words in the precedent element.
According to the construction, when important words common to the attention element and the precedent element are present, a relation degree corresponding to the total value of the weights of all the important words common to the attention element and the precedent element is given. In this manner, an optimum relation degree can be obtained according to the degree of connection between the attention element and the precedent element.
An embodiment further comprises an important word information storing section in which parts of speech to be recognized as the important words are stored,
wherein the important word recognizing section has a part of speech recognizing section for recognizing parts of speech in the respective elements; and a part of speech comparing section for comparing the recognized parts of speech and parts of speech to be recognized as the important words with each other to recognize words corresponding to parts of speech to be recognized as the important words from among words in the respective elements.
According to the construction, the important words are recognized based on a part of speech set in advance and stored. Thus, the important words can be easily recognized by consulting a dictionary.
An embodiment further comprises an important word recognizing section recognizing important words from words constituting the elements;
a meaning recognizing section recognizing meaning of each of the recognized important words; and
a concept system storing section storing a concept system for recognizing rank relationship between meanings of two of the recognized important words, an analogous relationship therebetween, and a part-to-whole relationship therebetween;
wherein the relation degree computing section has a determining section which regards that with reference to the concept system, one of the recognized important words in the attention element and one of the recognized important words in
Okunishi Toshiyuki
Yamaji Takahiro
Yoshimi Takehiko
Conlin David G.
Dike Bronstein, Roberts & Cushman IP Group of Edwards & Angell,
Edouard Patrick N.
Sharp Kabushiki Kaisha
LandOfFree
Text structure analyzing apparatus, abstracting apparatus,... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Text structure analyzing apparatus, abstracting apparatus,..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Text structure analyzing apparatus, abstracting apparatus,... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2898894