Text structure analysis method and text structure analysis...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06263336

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of Invention
The present invention relates to a text structure analysis method and text structure analysis apparatus applied to analytical processing of text. More specifically, the present invention relates to an apparatus and method for taking differing parts of a plurality of texts and extracting a portion of the content of a given text.
2. Description of Related Art
In known methods of text analysis, when performing processing on two given documents by determining and extracting the differing portions of the given documents, it is common to process them by treating one line or one sentence as a unit and performing structural analysis based on their connections/relationships. For example, there is known a method of processing text by examining the connections/relationships of sentences and creating a tree or graph of the text based on these connections/relationships. Another known method of performing text analysis creates a paragraph having joined sentences from the connections/relationships of the sentences.
Japanese Laid-Open Patent Applications No. 4-23765 (JP 4-23765), No. 6-35960 (JP 6-35960), No. 7-200589 (JP 7-200589), and No. 8-6945 (JP 8-6945) disclose examples of the tree/graph method of text analysis. Japanese Laid-Open Patent Applications No. 4-306768 (JP 4-306768) and No. 5-324708 (JP 5-324708) disclose examples of the method employing paragraphs of joined sentences.
The method according to JP 4-23765 performs syntactic analysis regarding each of two texts and tries to detect differing parts in these texts using syntax trees.
In the apparatus according to JP 6-35960, a document structure detection unit that uses surface-level vocabulary and a document structure detection unit that uses grammatical subjects are utilized to perform text analysis. The apparatus performs detailed structural analysis of documents, not only by using vocabulary information appearing on the surface level of the sentences, but also by using grammatical subjects detected from each sentence, including subjects not clearly indicated in the sentences.
The method of JP 7-200589 extracts text having qualifying relationships between sentences in the form of a tree structure. The method arranges and displays a portion of text using the extracted tree structure.
The method of JP 8-6945 generates nodes based on rules governing the assembly of attributes of neighboring lines, connects the nodes with links, and applies costs to the nodes and links. The method interprets the logical structure of the sentences by traversing text graphs.
The method of JP 4-306768 joins and performs structural analysis of sentences based on connections/relationships between the sentences in the documents.
The method of JP 5-324708 restores paragraph information according to connections/relationships and segmentation rules for each sentence, and performs structural analysis by considering that paragraph information.
All of these known methods and apparatus perform processing that examines the connections/relationships of texts with a sentence or line as being the smallest unit treated. As a result, computational volume is great and large amounts of computational time are required for processing.
Additionally, all of these known methods and apparatus merely perform processing according to predefined rules (rules regarding connections/relationships) The user cannot change the method of analysis in accordance with a particular sentence being processed. Furthermore, after having performed some sort of processing using the results of structural analysis, when outputting the results, various problems arise, such as having to perform analysis again using the results of the structural analysis and having to reconstruct the text for output.
Also, when taking the differing part between two sentences or lines of text, processing in the known methods is performed so as to indicate the fact that a changed portion exists in that line by outputting only the line having the change or by outputting the entire text and assigning a mark to the start of the line having the change.
For example, consider a portion of text representing a three-day weather forecast, such as shown in
FIG. 2
, having a change in a part of its content. For this example, the weather forecast is changed from that of
FIG. 2
to the forecast shown in FIG.
4
. Comparing the contents of FIG.
2
and
FIG. 4
, the probability of precipitation on the 2nd of the month was changed from 40% to 20% and the lowest temperature on the 3rd of the month was changed from 6° C. to 8° C.
In the known methods, when attempting to output the content of the differing parts of two portions of text or line units, only the changed part is displayed, as shown in
FIG. 8A
, or the entire text is displayed with a mark (for example, an asterisk) assigned to the line having the change, as shown in FIG.
8
B.
In the example shown in
FIG. 8A
, “9<” indicates the content of the 9th line before change, that is, the 9th line in
FIG. 2
(Probability of Precipitation 40%) and “9>” indicates the content of the 9th line after change, that is, the 9th line in
FIG. 4
(Probability of Precipitation 20%). In the same manner, “15<” indicates the content of the 15th line before change, that is, the 15th line in
FIG. 2
(Lowest Temperature 6° C.) and “15>” indicates the content of the 15th line after change, that is, the 15th line in
FIG. 4
(Lowest Temperature 8° C.).
In
FIG. 8A
, because only the line having the change is displayed, the surrounding context cannot be grasped. Similarly, in
FIG. 8B
, all of the text is displayed, but the context cannot be grasped because too much text is displayed.
SUMMARY OF THE INVENTION
Thus, the aim of the present invention is to provide a text structure analysis method and text structure analysis device capable of processing units or blocks of textual content. The processing is performed by extracting content for each collection of content of the text, for example, when taking the differing part of two texts and extracting a part of the textual content.
In a first aspect of the present invention, the text structure analysis method comprises detecting a content boundary pattern for each collection of textual content from an input text indicating a boundary of that collection, establishing a content boundary for the input text based on the result of that detection, and treating the input text in units or blocks of textual content for each collection of content based on the established content boundary. In another aspect of the present invention, the content boundary pattern is a lexical unit or code occurring repeatedly within the text. In a further aspect of the present invention, the content boundary pattern is a control code occurring repeatedly within the text when the input text is coded with a text coding language.
In another aspect of the invention, the content boundary pattern is a pattern respectively representing a starting point and an ending point of a given collection of content, and the content contained between these patterns is treated as one unit of textual content.
In a further aspect of the present invention, a text structure analysis device comprises a content boundary pattern storage means for storing a content boundary pattern for each collection of textual content from an input text indicating a boundary of that collection and a text analysis means for detecting a boundary section present in an input text based on the content stored in this content boundary pattern storage means. The text analysis means establishes a content boundary for the detected boundary section.
Thus, the present invention is capable of extracting portions of text in units or blocks of textual content based on established content boundaries. The invention performs this extraction by detecting a content boundary pattern indicating a boundary of a collection of textual content from an input text. As a result, processing based on a structural analysis from connections/relationships of the portions of text, and the like, b

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Text structure analysis method and text structure analysis... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Text structure analysis method and text structure analysis..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Text structure analysis method and text structure analysis... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2552907

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.