Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2002-08-15
2004-11-30
Corrielus, Jean M. (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C704S010000, C715S252000, C715S252000
Reexamination Certificate
active
06826567
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a registration and a search method for structured documents described in SGML (Standard Generalized Markup Language) or the like. More particularly, the invention is directed to a method of storing and a method of reading the lengths of elements forming a document.
As the information society grows at a rapid pace, an enormous amount of electronic documents are being prepared using word processors and personal computers in recent years. Under such circumstances, there are growing needs for searching documents containing the desired information from mounds of electronic documents. Full-text search is a technical solution to such needs. In the full-text search, the entire texts of documents to be registered is entered into a computer system to create a database at the time of registration, and all the documents containing a string (hereinafter referred to as “search term”) specified by the user is searched from the database at the time of search, so that all the desired documents can be searched reliably without requiring the user to specify a key word during the registration.
On the other hand, a scoring function is proposed, in which the matching degree to specified search conditions is evaluated by giving a score to each of the searched documents, and a list of such documents arranged in the order of given scores is displayed.
The book “Information Retrieval” (written by William B. Frakes and Ricardo Baeza-Yates and published by Prentice Hall) introduces a technique in which the matching degree (nfreqij) is calculated for searched documents using such factors as the occurrence frequency of a specified search term (hereinafter referred to as “search term occurrence frequency”) in each of the searched documents, the text length of each document and the following equation.
n
freq
ij=
(log
2
(freq
ij
+1))/log
2
(length
j
) Equation 1
where “freqij” is the occurrence frequency of a search term i in a document j; and “lengthj” is the text length of a document j.
U.S. Pat. No. 5,745,745 discloses a technique in which structured documents containing a search term are searched quickly by preparing a character component table for structured documents.
The related application cited as a cross-reference discloses a technique for registering a structured document by analyzing the hierarchical structure of the document. The application also discloses a technique in which a string index is extracted from a structured document and registered, and in which, at the time of search, a search term is decomposed into substrings and the character positions obtained from a plurality of character indexes are checked to obtain information about which positions in which documents the search term is located.
SUMMARY OF THE INVENTION
Each structured document has a unique hierarchical structure of its own. On the other hand, to calculate the matching degree, the element length of a partial logical structure (i.e., an element) or a higher-level logical structure of a structured document is necessary.
The object of the present invention is to obtain the occurrence frequency of a search term and the length of an element to be searched in a structured document quickly.
The present invention provides a registration method for structured documents, comprising the steps of: preparing correspondence data between a string and a string occurrence position within a structured document for each structured document, and additionally storing the correspondence data in an occurrence frequency extracting index, preparing a list of a character, an element containing the character and an element length thereof and additionally storing the list in an element length index at the time of registration, and also provides a search method for structured documents, comprising the steps of: inputting search conditions including a search term and an element for specifying a search range, decomposing the search term into a plurality of substrings, obtaining an occurrence frequency and an occurrence position of the search term using the plurality of substrings from the occurrence frequency extracting index, selecting a character from the search term, obtaining an element containing the character using the character from the element length index, and further extracting a length of the element within the search range; calculating a matching degree for the search conditions from the occurrence frequency and the occurrence position of the search term and the length of the element within the search range; and outputting the element containing the search term and the matching degree.
REFERENCES:
patent: 5276616 (1994-01-01), Kuga et al.
patent: 5465353 (1995-11-01), Hull
patent: 5669007 (1997-09-01), Tateishi
patent: 5704060 (1997-12-01), Del Monte
patent: 5745745 (1998-04-01), Tada et al.
patent: 5748953 (1998-05-01), Mizutani et al.
patent: 5757983 (1998-05-01), Kawaguchi et al.
patent: 5848407 (1998-12-01), Ishikawa et al.
patent: 5943443 (1999-08-01), Itonori et al.
patent: 5943669 (1999-08-01), Numata
patent: 5983171 (1999-11-01), Yokoyama et al.
patent: 5991713 (1999-11-01), Unger et al.
patent: 6377946 (2002-04-01), Okamoto et al.
patent: 6496820 (2002-12-01), Tada et al.
“Information Retrieval Pre ntice Hall”, pp. 373-374 and pp. 219-227.
Kawashimo Yasushi
Matsubayashi Tadataka
Okamoto Takuya
Sugaya Natsuko
Tada Katsumi
Corrielus Jean M.
Hitachi , Ltd.
Ly Anh
LandOfFree
Registration method and search method for structured documents does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Registration method and search method for structured documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Registration method and search method for structured documents will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3316522