Apparatus and method for document retrieval

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06574622

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system and method for supporting information retrieval in database. Particularly, the invention is concerned with a system and method which establish a new query suitable for database retrieval on the basis of a primary query, i.e., a preliminary retrieval expression, inputted on the basis of a user's idea in accordance with the user's intention who executes information retrieval, and in which actual information retrieval is executed on the basis of the new query. According to the configuration of the present invention it becomes possible to effect easy and accurate information retrieval. More specifically, according to the system and method of the present invention, the user inputs a provisional primary query comprising a key word in accordance with the user's intention independently of the database configuration, while on the basis of the primary query thus inputted the system of the present invention presents to the user candidates for the query to be used as retrieval conditions suitable for the database space, and the user establishes a query for retrieval from among the candidates thus presented, allowing retrieval to be executed by the query thus established.
2. Description of the Related Art
Heretofore, studies for information retrieval have actively been conducted as part of a natural language processing technique. An information retrieval system is generally modeled as in FIG.
1
. In this conventional model it is presumed that the following three gaps, according to a broad classification, are present in information retrieval.
(1) Gap between the user's retrieval intention and the query (retrieval expression) transcription in the system:
This gap is a difference which occurs when the user inputs and converts his or her retrieval intention (image) in accordance with a predetermined representation form. Since the retrieval intention is not clear, the presentation itself of query is in many cases difficult for those who are new to retrieval.
(2) Gap between the representation of query and a representation present in database:
In the retrieval system, matching is performed between information capable of being expressed by query and representation present in database, but there generally is a gap also between the two.
(3) Gap in relevance feedback conducted on the basis of the result of retrieval obtained:
Making reference to the result of retrieval outputted from the system, the user performs relevance feedback for approaching the retrieval information. However, it is difficult to judge whether the result of retrieval is in agreement with the user's intention or not; further, it is not until actual execution of retrieval that the influence of a change in query becomes clear.
Problems involved in the existing retrieval systems will be enumerated below in a corresponding relation to the above description.
A. Full Text Retrieval Based on Boolean Expression as an Example
It is presumed that the full text retrieval method will solve the above-mentioned problem (2). More particularly, in the case of a word described in a sentence, retrieval can be made from the description of that word and hence the gap present between the representation of query and a representation present in database is minimized. However, since this is a word-level solution, the above point (1) is a problem to users not accustomed to the query description language.
B. Retrieval Based on Natural Language Interface
A natural language interface has been proposed to solve the above problem A. This is presumed to diminish the gap of the above (1) by inputting a phrase or sentence which the user hits upon, directly as a query. However, the representation held in the database is not always the same as the input phrase, so if matching is tried for the two, it rather results in an increase of the gap (2). Since it is difficult to observe from the user side what matching is performed internally, it rather becomes difficult to effect relevance feedback, that is, the problem (3) is also actualized.
C. Relevance Feedback Support
On the basis of the result of retrieval, certain feedback support is performed for solving the above problem (3). It is also possible to combine the above A with B. The following are mentioned as examples, which, however, cannot be regarded as satisfactory solutions.
C-1. Showing a candidate list of restricted key words to the user, allowing the user to designate a word:
Using a query and a statistical information or the like between words present in the result of retrieval, such restricted candidates as in
FIGS. 2 and 3
are shown. Both examples are in an actual Internet search engine and the example shown in
FIG. 2
is an example of English words in Altavista (http://altavista.digital.com), in which displayed English key words are added and retrieval is rerun, whereby restriction of data is effected.
FIG. 3
shows an example of Japanese words in Excite Japan (http://www.excite.co.jp), wherein a key word is selected from additional key words present at the upper stage and is added, thereby executing retrieval and permitting restriction of data. In Japanese Published Unexamined Patent Application No. Hei 10-74210 entitled “Document Retrieval Supporting Method and Document Retrieval Service Using Same,” characteristic words are extracted on the basis of, for example, the frequency of each word appearing in a document and the user is allowed to select a word in accordance with to what extent the user is interested therein.
As is seen also from the example shown in
FIG. 2
or
3
, as long as a simple word-level frequency or co-occurrence is based, an increase in the number of analogous words or adjacent nouns is unavoidable and thus it becomes difficult to show appropriate candidates. This is a problem common to the conventional systems laid open so far. Moreover, since it is impossible for the user to judge in what manner the word concerned is used in the document, it is difficult to judge as to whether the word is to be selected as a retrieval word or not. It is also difficult to judge how the selection will be reflected in retrieval. This is also presumed to be because all the retrieval originally relies on only such information of a small size as words.
C-2. Allowing the user to designate a document close to the user's retrieval intention from among candidates:
An example is shown in FIG.
4
. According to this configuration, as shown in the same figure, a new retrieval is executed on the basis of a feature quantity in the document designated by the user. The example shown in
FIG. 4
is an example in a catalog home page retrieval of InfoNavigator (http://infonavi.infoweb.ne.jp). This system is what is called a manual catalog type system like Yahoo! for example. Since a summary is given by manual operation, it may be possible even at the summary level to judge whether document designation is to be made or not. In a robot type search engine, however, the head of a sentence is merely displayed in many cases. As to WWW document, it is impossible to specify an object and the user is not a specialist in many cases, and the judgment as to whether the page concerned is to be added to feedback or not is difficult unless the user sees actual page contents. In fact, the search of a robot collection page in the above search engine lacks this function.
In Japanese Published Unexamined Patent Application No. Hei 9-153051 entitled “Analogous Document Retrieving Method” there is shown an example of relevance feedback in ranking which uses n-gram (a character string of continuous n characters). However, it is difficult to grasp how a document selected for relevance feedback will be reflected in the result obtained. In addition, it is very troublesome to check the contents of document on the user side.
Thus, using a document as a unit of feedback results in too large an object size, giving rise to such problems as an increase in the user's burden caused by user's reading of

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method for document retrieval does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus and method for document retrieval, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for document retrieval will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3099246

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.