Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-03-30
2002-12-24
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06499030
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an information retrieval (IR) apparatus, IR method, and a storage medium storing a program for realizing the process.
2. Description of the Related Art
Recently, it has been more and more popular to process and store document information using electronic appliances and storage media, and it has become common to share document information among a number of users. Normally, documents can be shared using a database. The database is normally stored in an external storage device. However, the storage capacity of an external storage device has been extended year by year, and the volume of documents to be stored in the database has become enormously large.
As a system of retrieving such a database, a Boolean IR system, a non-Boolean IR system, and a combination system of the two IR systems (hereinafter referred to as a combination system) are used.
In the Boolean IR system, a document (or a set of documents) containing a keyword is defined as ‘true’, and a document (or a set of documents) not containing a keyword is defined as ‘false’, and a document (or a set of documents) whose logical expression inputted as a retrieval query is ‘true’ can be specified. The retrieval query can be a logical expression obtained by connecting a plurality of keywords using logical symbols such as AND, OR, NAND, etc.
The non-Boolean IR system is a user-friendly IR system aiming at allowing a common user to easily retrieve necessary data. Various methods are proposed by the non-Boolean IR system. For example, a method of retrieving data through a fuzzy IR system using a multi-value logic instead of a binary logic of ‘true’ or ‘false’ (for example, the invention disclosed by the Japanese Patent Laid-Open No. 06-162101 published by the Japanese Patent Office), a method of retrieving data in natural language text using an input device for receiving natural language text as a retrieval query not a logical expression (for example, the invention disclosed by the Japanese Patent Laid-Open No. 03-130873), and a similarity retrieval device for ranking retrieval results in the natural language text and display them (for example the invention disclosed by the Japanese Patent Laid-Open No. 03-172966) are proposed. Normal ranking retrieval is classified as a non-Boolean IR system.
As a combination system, a device for generating a logical expression for a Boolean IR system from natural language text (for example, the invention disclosed by the Japanese Patent Laid-Open No. 10-134078 published by the Japanese Patent Office) is proposed.
In addition, a method for manipulating the ranking order in a ranking IR system, a IR system which assigns a hierarchical level for a ranked document (for example, the invention disclosed by the Japanese Patent Laid-Open No. 09-153066 published by the Japanese Patent Office) is proposed. This system analyzes a syntax called ‘functional unit’ in a user-inputted sentence, and sets up a hierarchy for each functional unit.
However, the above described conventional IR systems and the ranking IR system have the following problems.
First, in the Boolean IR system, a retrieval query is evaluated by two values ‘true’ and ‘false’, thereby applying a strict retrieval condition to the retrieval query. Therefore, it is difficult for a user to appropriately generate a retrieval query specifying a desired document (or a set of documents). There also has been the problem that a user has to be well-trained in generating the retrieval query.
In addition, in the non-Boolean IR system and the combination system, the similarity between a retrieval query and a document is determined by a system, and a user cannot easily change the similarity. To solve the problem, a IR system (for example, the invention disclosed by the Japanese Patent Laid-Open No. 07-225772 published by the Japanese Patent Office) which is provided with a device through which a user can input the weight between keywords to reflect the intention of the user in the retrieval has also been proposed. However, the final weight of keywords is determined by the similarity computation mechanism in a IR system. As a result, there is the possibility that a retrieval result deviates from the intention of the user.
Furthermore, according to the invention disclosed by the Japanese Patent Laid-Open No. 09-153066 published by the Japanese Patent Office, there has been the problem that the functional unit of a user-inputted sentence does not always match the functional unit of a relevant document.
As described above, since a retrieval query is evaluated by two values ‘true’ and ‘false’ in the Boolean IR system, the retrieval condition is strict, and a user has to be well-trained to effectively use the IR system. In addition, to solve problem with the Boolean IR system, the non-Boolean IR system and the combination system are designed to determine the similarity between a retrieval query and a document by each system, and the user cannot easily change the ranking order of documents. Furthermore, there is the problem in the non-Boolean IR system using a natural language that the current natural language processing technology is not completed, and cannot sufficiently analyze the intention of a user only according to the information in a natural language.
The above described problems with the conventional technology can be summarized as follows.
1) Since a complicated retrieval query should be generated to appropriately perform a document retrieving process in the Boolean IR system, it takes a long time for the user to become skillful in using the system. In other words, a beginner user cannot sufficiently utilize the system, and only a skilled user can effectively use the system.
2) In a simple non-Boolean IR system, the occurrence number of a keyword determines the similarity. Therefore, there is the possibility that a document not requested by the user may change ranking order.
3) Furthermore, the non-Boolean IR system has the following problems with user-input.
1. In the retrieval query in a natural language, detailed query cannot be performed for the similarity computation mechanism. Therefore, the retrieval query cannot be performed with the intention of a user sufficiently reflected.
2. In the IR system in which the weight between keywords is specified, it is necessary for a user to fully understand a similarity computation method used in the IR system. Therefore, a common user cannot easily use the system.
It can be recognized that the system of adding the weight between keywords cannot reflect the intention of a user because the adding of the weight does not apply to the feeling of a user. That is, in the conventional system, the influence of the weight specified by a user on the similarity depends of the designer of the IR system. When the concept of the designer is different from the recognition of the user, the user cannot specify the weight of a keyword which can sufficiently reflect the intention of the user.
In addition, in a normal similarity computation mechanism, the occurrence number of a keyword is an important factor for determining the similarity. However, the mechanism is not provided with a unit for determining whose similarity is higher, a document containing a larger number of types of keywords, or a document containing a frequently occurring keyword. However, the intention of a user determines which is prioritized between the above described two documents. Therefore, which between the above described two documents is prioritized depends on each retrieval query and each keyword, but there are no IR systems designed in consideration of this point.
SUMMARY OF THE INVENTION
The present invention aims at providing an IR system capable of describing data as correctly as the Boolean IR system without obtaining the knowledge about a complicated logical expression or knowing the designing concept of the IR system, and of easily reflecting the intention of a user in a ranking result.
Described below is each aspect of the present invention. According to th
Amsbury Wayne
Fujitsu Limited
Staas & Halsey , LLP
Thai Hanh
LandOfFree
Apparatus and method for information retrieval, and storage... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for information retrieval, and storage..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for information retrieval, and storage... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2945802