Apparatus and method for information retrieval using...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06678677

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention is concerned with an information retrieval apparatus and method which allow the user to retrieve information from databases, such as document databases, containing a plurality of documents.
2. Background of the Invention
Information retrieval systems serve to retrieve those documents that are relevant to the information needs of a user. With the explosive growth of the use of databases and the Internet, information retrieval increasingly fails to enable efficient retrieval of available information. The problem lies at both ends of the system. At one end, there is the ever increasing number of documents that vary widely in content, format and quality. At the other end, there is a huge number of unknown users with extremely diverse needs, skills, educational, cultural, and language backgrounds. Conventional search method and apparatus are, however, not sophisticated enough to provide satisfactory solutions. The search capabilities of conventional search methods and apparatus are designed either for high recall and the “average user”, or for searches of high precision. Both approaches may not retrieve the desired information, although available within a database.
In general, the relevant information contained in the documents is constructed and extracted according to a normalized representation. This representation is abstracted away from its original linguistic form. Database queries of a user are generally subjected to a processing in order to expand the scope of the query and/or to interpret the query syntax. The extracted query information is then matched against the stored representations in order to retrieve specific information contained in the documents.
Those documents which are the most similar to the query are output as retrieved documents.
Different methods exist to find those documents relevant to the query. Statistical methods count the number of times each word of the query appears in each document. Documents in a database are ranked according to the obtained count values. If the number of words in a query is not sufficient, less than two or three words, the number of words may prove to be insufficient to find the documents relevant to the request.
Other approaches use a refined document preprocessing which is based on a deep parsing procedure applying a complex grammatical analysis on the documents to extract an entire sentence dependency structure. Such approaches generally require a huge computational effort without providing satisfactory results. As complex sentences are difficult to analyze, even a complete dependency analysis may only return several possible dependency structures for a single sentence. Other information retrieval systems expand the scope of a query taking semantic relations of words into account. It turned out, that such an approach does not return better results.
For evaluating retrieval performance of information retrieval systems, two criteria are used, namely the “calling rate” and the “precision”. These criteria are based on the subjective point of view on the relevance of retrieved information. The “calling rate” or “recall” and the “precision” are defined as follows.
The calling rate or recall is a ratio of the number of pertinent documents retrieved to the total number of pertinent documents stored in the database, the precision is a ratio of the number of pertinent documents retrieved to the number of all documents retrieved. There is usually a trade-off between these two criteria. In information retrieval, it is desirable that these two criteria are in proximity to the maximum value of one.
Most traditional information retrieval systems are optimized for longer queries and perform worse for short, more realistic queries. According to surveys made on the Internet, the average request comprises only a few words (mostly less that five words).
SUMMARY OF THE INVENTION
The present invention has been made in consideration of the above situation, and it is the primary object of the present invention to provide an improved method and an improved apparatus that retrieve information from a database.
It is a further object of the invention to provide a method and an apparatus for information retrieval that improve the ranking of retrieved documents.
It is another object of the invention to provide a method and an apparatus for information retrieval that pushes the most salient documents on top of a list of retrieved documents.
It is still another object of the invention to provide a method and an apparatus that increases the proportion of relevant documents retrieved from a document database.
It is still another object of the present invention to provide a method and an apparatus that retrieve information from a database with a higher precision.
It is yet another object of the invention to provide a method and an apparatus that increase effectiveness of information retrieval.
These and other objects of the present invention may become apparent hereafter.
To achieve these objects, the present invention provides a method and an apparatus that combine the use of syntactic constructions with an enlargement of terms for documents and queries to improve precision and calling rate for information retrieval. The method for document retrieval of the present invention relates to databases comprising internal representations of documents wherein the internal representations include syntactic relations between terms of sentences of the documents and a semantic lattice for the terms of the documents in the database, the semantic lattice specifying semantic relations between the terms. The method comprises the step of extracting syntactic relations between terms of the query and creating an internal representation of the query based on the terms of the query and the extracted syntactic relations between the terms of the query. Further, the method appends new terms to the semantic lattice if the query includes terms not included in the semantic lattice in the database. The query is projected onto the documents in the database by comparing the internal representation and terms of the query to the internal representations and terms of the documents using the semantic lattice for comparing the terms and a similarity is computed between the query and each document. The documents in the database are ranked according to their computed similarities, and the documents are output as retrieved documents according to the established rank order.
According to a second aspect of the present invention, there is provided an apparatus for retrieving documents from a database. The database comprises internal representations of documents wherein the internal representations include syntactic relations between terms of sentences of the documents and a semantic lattice for the terms of the documents in the database, the semantic lattice specifying the semantic relations between the terms. The apparatus comprises a query input unit, and query processing unit, a semantic lattice management unit, a matching unit and a presentation unit. The query input unit receives a query and provides the query to the query-processing unit. The query-processing unit creates an internal representation of the query based on the terms of the query and syntactic relations between the terms of the query. The semantic lattice management unit appends new terms to the semantic lattice if the query includes terms not included in the semantic lattice in the database. The matching unit projects the query onto each of the documents in the database by comparing the internal representation of the query to the internal representation of the documents using the semantic lattice for comparing the terms. The matching unit further computes a similarity between the query and each document. The presentation unit ranks the documents in the database according to the computed similarities and outputs documents as retrieved documents according to the established rank order.
Furthermore, the present invention provides a computer program product, for use

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method for information retrieval using... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus and method for information retrieval using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for information retrieval using... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3187119

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.