Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-06-27
2003-02-25
Corrielus, Jean M. (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06526410
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a structured document difference string extraction method and apparatus for a document processor such as a word processor capable of extracting a difference character string between structured documents stored as an electronic file.
A structured document is defined as one, having embedded therein, i.e., containing information on the logical structure of a document, that is, information such as “this portion of the document constitutes a chapter” or “this portion makes up a title”.
The difference extraction between documents is defined as detecting a most coincident combination of elements constituting each document including paragraphs, lines and characters and extracting non-coincident elements as a difference. Suppose that two documents for which the difference is to be detected are “ABCDEFG” and “ACDAEFH”. When the two documents are compared in terms of elements thereof including A, B, C, D, E, F, G and H, the most coincident combination is detected as “correspondence of ACDEF”. Also, the difference is detected in the form of “B is deleted”, “A is inserted after D” or “G is changed to H”.
A conventional method for difference extraction is disclosed in JP-A-2-255964, in which comparison is made in terms of punctuation marks, lines, words and characters. In application of this method to structured documents, a character string representing a logical structure contained in the documents is compared in the same manner as other character strings are compared in the documents.
Extraction of a difference in a structured document by the same means as in a normal document may be inappropriate to the document editor, however, since the result may be non-coincident with the logical structure of the document.
The following Examples 1-3 were considered by the Applicants during development of the present invention, and have not been known or published publicly.
EXAMPLE 1
With reference to the structured documents shown in
FIGS. 3A and 3B
, the case will be explained in which documents having non-coincident logical structures are erroneously matched with each other in the process of difference extraction, thereby leading to an extraction result inappropriate to the document editor.
The structured documents in
FIGS. 3A and 3B
are described by SGML (Standard Generalized Markup Language; ISO 8879), indicating that a character string sandwiched by marks, for example, <A> and </A> called tags is associated with a logical structure A. In other words, the character string “TARO HEISEI” sandwiched between “<NAME>” and “</NAME>” of
FIG. 3A
is associated with the logical structure “NAME”. HTML (Hypertext Markup Language) which is used in WWW (World Wide Web) is an application of SGML and is applicable to the present invention as well.
Another name of the mark representing this logical structure is a tag. “<A>” and “</A>” thus are alternatively called a start tag and an end tag, respectively.
The result of extracting a difference character string between two structured documents in
FIGS. 3A and 3B
by the is shown in
FIGS. 4A and 4B
.
FIG. 4B
shows the result of extracting difference character strings of the structured document in
FIG. 3B
relative to the structured document in FIG.
3
A.
FIG. 4A
shows the result of extracting difference character strings of the structured document in
FIG. 3A
relative to the structured document in FIG.
3
B.
As seen from
FIGS. 4A and 4B
, “HEISEI” associated with “<NAME>” and “HEISEI” associated with “<TRANSMISSION DATE>” are not extracted as the difference. This is due to the fact that “HEISEI” was coincident and erroneously matched with each each other. This correspondence of “HEISEI” not coincident in logical structure is obviously meaningless to the document editor.
EXAMPLE 2
With reference to the structured documents shown in
FIGS. 5A and 5B
, the case will be explained in which character strings are matched erroneously over different document structures in the process of difference extraction due to the insertion of a document structure, thereby leading to an extraction result not proper to the document editor.
FIG. 5A
shows a structured document having Chapter
1
, and
FIG. 5B
a structured document with one other chapter inserted before Chapter
1
.
FIGS. 6A
,
6
B show an example of extracting a difference character string between the two structured documents of
FIGS. 5A
,
5
B.
FIGS. 6A
,
6
B show a case similar to
FIGS. 4A
,
4
B, in which
FIG. 6B
shows the result of extracting a difference character string of
FIG. 5B
relative to FIG.
5
A.
FIG. 6A
, on the other hand, shows the result of extracting a difference character string of
FIG. 5A
relative to FIG.
5
B.
As seen from
FIG. 6A
, Chapter
1
of
FIG. 6A
is matched over Chapter
1
and Chapter
2
of
FIG. 6B
in spite of the fact that Chapter
1
of
FIG. 6A
is identical to Chapter
2
of FIG.
6
B. This is another case inappropriate to the document editor.
Dual appearance in
FIG. 5B
of the same character string “STRUCTURED DOCUMENT” unlike in FIG.
5
A leads to the erroneous decision in
FIG. 6B
that the first “STRUCTURED DOCUMENT” is coincident while the second “STRUCTURED DOCUMENT” is non-coincident, so that the second “STRUCTURED DOCUMENT” and extracted as a difference. This is true with each of subsequent cases of difference extraction.
EXAMPLE 3
With reference to the structured documents of
FIGS. 7A
,
7
B, explanation will be made of the case in which the difference in marks representing the logical structure of a document makes it impossible to match the contents of documents with each other in spite of the identical logical meaning of the documents, resulting in the extraction inappropriate to the document editor.
In
FIGS. 7A
,
7
B, a tag <FIRST ITEM> is attached to only the item that first appears in spite of the fact that the logical meaning of the document remains the same and “ITEM”.
FIGS. 8A
,
8
B show the case in which difference character strings between two structured documents of
FIGS. 7A and 7B
are extracted by the conventional technique.
FIGS. 8A
,
8
B represent a case similar to
FIGS. 4A
,
4
B, in which
FIG. 8B
shows the result of extracting difference character strings of
FIG. 7B
as compared with
FIG. 7A
, while
FIG. 8A
shows the result of extracting difference character strings of
FIG. 7A
as compared with FIG.
7
B.
From
FIGS. 8A
,
8
B, it is seen that “FIRST ITEMs” are matched with each other and the character strings associated with them are compared with each other as the contents thereof. The logical meaning of “FIRST ITEM” and “ITEM” are the same for the document editor, and therefore the contents of the tags are required to be matched in priority over the tags.
In extracting the difference between structured documents, comparison between them is required taking into consideration the logical meaning and the structure of the structured documents. This requirement is not met by the conventional method in which character strings indicating a logical structure are compared in similar fashion to other character strings in the document.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a method and an apparatus for extracting a difference character string between structured documents in a manner suited to the linguistic sense of the document editor taking the logical meaning and structure of the structure documents into consideration.
Another object of the present invention is to provide a method and an apparatus for managing the editing of a structured document for a document processing system capable of managing the editing on the basis of comparison and discrimination of the logical structures of structured documents.
In order to achieve the above-mentioned objects, according to one aspect of the invention, there is provided a structured document difference extraction method including memory means for storing structured documents defined as information on the logical structure of docu
Aoyama Yuki
Higashino Jun'ichi
LandOfFree
Method and apparatus for structured document difference... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for structured document difference..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for structured document difference... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3166977