Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-08-19
2003-09-16
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06622139
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field of the Invention
The present invention relates to an information retrieval apparatus for retrieving information from a hyper-document system composed of links between nodes, and a medium having an information retrieval program for constructing the information retrieval apparatus in a computer recorded therein, and more particularly to an information retrieval apparatus for retrieving information with node groups having definite meaningful consistence as a retrieval object, and a computer-readable recording medium having the information retrieval program recorded therein.
2. Related Art
A hyper-document system (system described in, for example, HTML (Hyper Text Markup Language)) having no restraint in the meaning in the links between nodes has an advantage that the document author can determine the contents and link structure at will. Also, the document reader can obtain access to a multiplicity of documents prepared by a multiplicity of document readers through the use of a computer network (for example, World Wide Web).
As a related art for supporting the document reader to retrieve his/her desired information from such a hyper-document system, there are the following two ones:
A first related art is a technique in which retrieval indexes for each node are prepared in advance by scanning nodes of as large a quantity as possible (at random) and an index which matches a query (combination of key words) from the document reader is presented (for example, AltaVista, http://altavista.digital.com/). In this respect, as constituent technologies for implementing this technique, a vector space model (G. Salton & J. Allan, Text Retrieval Using the Vector Processing Model, in Proc. of SDAIR94) which is a statistical language processing technique, has been devised in the creation of the retrieval indexes and matching with queries.
A second related art is a technique in which nodes of as large a quantity as possible are scanned (at random) in advance to be allocated to a directory having tree structure which has been classified by topics. The document reader looks for a topic in which the desired information is considered to be contained from the directory to obtain access to the information (for example, Yahoo, http: //www.yahoo.com/) aimed at. In this respect, as constituent technologies to implement this technique, there has been proposed automatic document classification technique (for example, P. Jacobs, Joining Statistics with NLP for Text Categorization, in Proc. of Applied-ACL92) to which the natural language processing has been applied. Further, there has also been devised automatic document classification technique (U.S. Pat. No. 5,526,443, T. Nakayama (Fuji Xerox), Method and apparatus for highlighting and categorizing documents using coded word tokens, issue date: 1996.6.11) in which the media have been expanded into images. Problems to be solved by the Invention
In these two related arts, however, since one node is regarded as one retrieval object unit, the essence of the hyper-document system in which a concept is expressed with a structure consisting of nodes and links cannot be grasped, and the following problems have been pointed out.
The first problem is that although it depends upon the taste of the document author into how many nodes a certain piece of information is divided and into what structure they are built up, node groups built up on a hyper network cannot be grasped on the whole as information having meaningful consistence by such retrieval that nodes are regarded as one unit. In other words, in the retrieval based on the related art, only pieces of information which are imperfect in terms of meaning are to be retrieved, and the context cannot be reflected in the retrieval.
The second problem is that a concept representing a retrieval request cannot be expressed in the structure on the hyper network.
In order to solve these problems, it is necessary to change the retrieval in which nodes are regarded as one unit, and to perform the retrieval in which information having meaningful consistence is regarded as one unit. Such retrieval could be implemented if a feature of a certain starting point node is compared with a feature of an N-order node (N=2, 3, . . .) linked from the starting point node to determine their similarity, and N-order nodes which are determined to be similar are merged with the starting point node. The present applicant has applied for patent (Japanese Published Unexamined Patent Application No. Hei 09-153387) for the invention concerning such an information retrieval apparatus.
This technique enables the document reader to retrieve the desired hyper-structure. In other words, the document reader can acquire useful information by browsing the hyper-structure presented by the retrieval apparatus.
Generally, however, a browsing path has a plurality of branches, and it is not known which links should be transited in order to effectively acquire useful information. For this reason, the document reader actually must depend on trial-and-error methods on selecting those branches while understanding the contents of the nodes which he/she is currently reading. Perusal using such trial-and-error methods is not efficient, but it takes more time than necessary to acquire the desired information.
SUMMARY OF THE INVENTION
The present invention has been achieved in the light of the above-described points, and is aimed to provide an information retrieval apparatus capable of effectively perusing useful information within the hyper-text structure retrieved in the retrieval in which information having meaningful consistence is regarded as one unit.
Also, it is another object of the present invention to provide a computer-readable recording medium having an information retrieval program recorded therein, the information retrieval program being capable of causing a computer to execute such a process as to perform the retrieval in which information having meaningful consistence is regarded as one unit, and to allow useful information within the hyper-text structure retrieved to be effectively perused.
As a first information retrieval apparatus according to the present invention for solving the above-described problems, there is provided an information retrieval apparatus for retrieving a hyper-document system composed of links between nodes, which are units of information, comprising: a node group constituting part for constituting node groups consisting of nodes, which are combined through links and are similar in contents, aiming at the nodes in the hyper-document system; a component node storing part for storing component nodes which constitute the node groups; an information retrieval part for retrieving, when a retrieval request is inputted, similar node groups having a high degree of similarity which meet the retrieval request in a plurality of the node groups; a similarity calculation part for calculating degrees of similarity between the component nodes stored in the component node storing part and the retrieval request concerning the similar node groups returned as a candidate as a result of the retrieval by the information retrieval part; and a similarity retrieval result displaying part for displaying paths for accessing each component node in the similar node groups in such a manner that component nodes having a high degree of similarity to the retrieval request can be distinguished.
According to such an information retrieval apparatus, node groups consisting of nodes, which are combined through links and are similar in contents among the nodes in the hyper-document system are constituted by the node group constituting part. Then, component nodes, which constitute node groups, are stored by the component node storing part. Thereafter, when a retrieval request is inputted, similar node groups having high degrees of similarity to the retrieval request among a plurality of node groups are retrieved by the information retrieval part. Next, concerning the similar node groups returned as a candidate as a
Kato Hiroki
Miyake Hidetaka
Nakayama Takehiro
Fuji 'Xerox Co., Ltd.
Mizrahi Diane D.
Oliff & Berridg,e PLC
LandOfFree
Information retrieval apparatus and computer-readable... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information retrieval apparatus and computer-readable..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information retrieval apparatus and computer-readable... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3023419