Document search method for registering documents, generating...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06510425

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a method of document registration and a method of document search for a document search system or a document management system using a computer system, or more in particular to a method and apparatus for registration and search of a mass of structured documents each having a logical structure, which is capable of searching specific document contents at high speed, and a portable medium used for them.
With the full scale progress of the information society, computerized document information generated using the word processor, the personal computer or the like have increased more than ever before. Under these circumstances, demand is rising for quickly and accurately retrieving a document containing the required information from a vast accumulation of computerized documents.
A technique meeting this demand is the full-text search. In full-text search, the entire text in the document to be registered is loaded in a computer system and converted into a data base, and the data base is searched directly for a specified character string (hereinafter referred to as the query term). This requires no key word and basically makes possible a search free of detection failure.
On the other hand, high-accuracy search can be realized by adding conditions for logic structure to the query (hereinafter referred to as the structure-specified search) intended for documents in which individual logic elements can be identified (hereinafter referred to as the structured document), including a document described in SGML, for example (C. F. Goldfarb: “THE SGML HANDBOOK” Oxford 1993).
A search method permitting the structure-specified search is proposed in JP-A-8-147311 (hereinafter referred to as the well-known example 1). The well-known example 1 will be briefly described below.
In the method of structured document search according to the well-known example 1, a document is registered first as a text directly in a search data base.
Then, a specific character string (hereinafter referred to as the front marker for the well-known example 1) indicating the head of each logic structure of the registered text and a specific character string (hereinafter referred to as the rear marker for the well-known example 1) indicating the tail of each logic structure of the registered text are detected thereby to identify the logic structure while at the same time segmenting the text by logic structure. In the electronically filed patent specification, for example, “<SDOABJ>” is detected as a front marker and “</SDO>” as a rear marker indicating the scope of the logic structure “abstract”, whereby the text defined by them is cut out as a text corresponding to the “abstract”. A similar cut-out work is performed also for other logic structures to segment the text by logic structure.
Then, the text corresponding to each logic structure is condensed, and a condensed text is produced. Specifically, as for the “abstract”, the text thereof is segmented into substrings by word, and the inclusion relation is checked mutually between the substrings thus segmented. In the process, the character strings contained in other substrings are removed, thereby producing a condensed test of the “abstract”. A similar processing is performed for other logic structures to produce a condensed text by logic structure and registered in the search data base as a condensed text file.
Then, “1” is set to a bit corresponding to the character code of the characters appearing in the text to generate a character component table, which is registered as a character component table file in the search data base.
After constructing a search data base in this way, the document search is conducted in the following manner for the well-known example 1.
First, a specified query term is decomposed by character, and the documents containing all the characters constituting the query term are extracted with reference to the character component table.
Then, the condensed text file for the logic structure specified as an object of search is selected among the condensed text files corresponding to logic structures. At the same time, only the condensed text of a document extracted by the character component table search is searched, thereby extracting a document containing the query term specified in the specified logic structure. In the case where the positional relation between a plurality of query terms in the text is not specified in the specified query formula, the search process is terminated. In the case where such a positional relation is specified, on the other hand, the contents of the text corresponding to the document extracted as a result of condensed text search is read, and only those texts containing all the specified query terms and meeting the specified conditions for the positional relation between the query terms are extracted.
In this way, according to the method of the well-known example 1, a structure-specified search is made possible while maintaining a practical search speed for a large-scale text data base.
SUMMARY OF THE INVENTION
The prior art disclosed in the well-known example 1 described above makes possible a structure-specified search to some extent. Nevertheless, there may be the case in which search meeting the structural conditions is impossible as intended by the structure-specified search of the well-known example 1.
In the method of the well-known example 1, the structure of a registered document involved is segmented into several predetermined subelements, and a condensed text file is produced for each subelement. At the time of search, a mass of the condensed text files to be searched is determined by reference to a table defining the correspondence between the structure name of the subelement and the condensed text file, and only the condensed text files contained in the particular mass are searched thereby to realize a structure-specified search.
This method estimates a future search specifying the structural condition at the time of constructing a text data base, and segments the condensed text files in such a manner as to permit a search meeting such a condition. Therefore, the search specifying the structural condition not assumed at the time of data base construction is impossible to conduct.
Assume, for example, that a document is configured of two logic elements (hereinafter called the elements) including “abstract” and “body”, and the latter is configured of repetitions of an arbitrary number of “clauses”, which in turn includes one “clause subject” and an arbitrary number of “paragraphs”. In constructing a text data base from a set of documents having this structure, the condensed text files is segmented into those corresponding to “abstract” and those corresponding to “body”. It is impossible to conduct a structure-specified search meeting the condition that “a set of documents containing a string XX in the clause subject is determined”.
Of course, this condition can be met if instead of making one condensed text file of the whole “body”, the “body” is segmented further into “clause subjected” and “paragraph” to produce a condensed text file. Even when the file is configured this way, however, it is impossible to meet the structural condition that “a set of documents containing a string XX in the first clause (clause subject or paragraph) is determined” or that “a set of documents containing a string XX in the last paragraph of a clause is determined”. For this structural condition with a specified order is to be met, it is necessary to prepare a condensed text file for each order of occurrence of a clause and a paragraph. In view of the fact that an arbitrary number of clauses and paragraphs can occur, however, the number of the condensed text files would become enormous. In addition, the well-known example 1 lacks means for setting a correspondence between the structural condition containing an arbitrary specification of the order of occurrence and a mass of finely segmented condensed text files. Actually, therefore, the search meeting this

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Document search method for registering documents, generating... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Document search method for registering documents, generating..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Document search method for registering documents, generating... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3070937

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.