Process and system for retrieval of documents using...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C706S015000

Reexamination Certificate

active

06189002

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to computer-based document search and retrieval. It provides a computational means for learning semantic profiles of terms from a text corpus of known relevance and for cataloging and delivering references to documents with similar semantic profiles.
BACKGROUND
The number of documents contained in computer-based information retrieval systems is growing at tremendous rates. For example, the world wide web is thought to contain more than 800 million documents already. People looking for specific information in that sea of documents are often frustrated by two factors. First, only a subset of these documents is indexed and, second, of those that are indexed, many are indexed ambiguously. The present invention does not address directly the first limitation of document retrieval, but this limitation nonetheless plays a role in addressing the second limitation.
The main method currently used for document retrieval is keyword or free-text search. A user enters a search query consisting of one or a few words or phrases and the system returns all of the documents that have been indexed as containing those words or phrases. As more documents are indexed, more documents are expected to contain the specified search terms. For example, one world wide web search engine recently returned more than 755,000 documents in response to a query for the word “pitch.” Adding the word “shot” to the query resulted in 797,000 documents. These large quantities of documents cannot be usefully examined by the user, and there is no guarantee that the desired information is contained by any of them. Increasing the likelihood that a search using such a system will retrieve a desired document necessarily demands decreasing the specificity of the search.
Furthermore, many of the documents retrieved in a standard search are irrelevant to a user's needs because these documents use the searched-for terms in a way different from that intended by the user. Human word use is characterized by polysemy. Words have multiple meanings. One dictionary, for example, lists more than 50 definitions for the word “pitch.” We generally do not notice this ambiguity in ordinary usage because the context in which the word appears allows us to pick effortlessly the appropriate meaning of the word for that situation.
Human language use is also characterized by synonymy. Different words often mean about the same thing. “Elderly,” “aged,” “retired,” “senior citizens,” “old people,” “golden-agers,” and other terms are used, for example, to refer to the same group of people. A search for one of these terms would fail to select a document if the author had used a different synonym. Psychological studies have shown that people are generally poor at remembering which specific words were used earlier to express an idea, preferring instead to remember the gist of the passage.
Current search engines use Boolean operators to try to address these problems, but most nontechnical users do not seem either to know or use Boolean operators. A study of actual searches submitted to the Alta Vista search service found that 80% of searches employed no operators whatever. See Craig Silverstein et al.,
Analysis of a Very Large Alta Vista Query Log
, SRC T
ECHNICAL
N
OTE
, 1998-014 (Oct. 26, 1998). Another way that users could solve these problems is to include enough terms in a query to disambiguate its meaning or to include the possible synonyms that the document's author might have used. Again, people do not seem inclined to use such strategies in that the average number of terms entered in a query was found to be just over 2.0, and two-thirds of all queries employed two or fewer search terms. Perhaps one reason for this is that adding additional terms typically changes the order in which page links are displayed for the user. Another reason may be that additional terms often increase the number of documents that are retrieved.
OBJECT OF THE INVENTION
The object of the invention is to provide a method for document search in a defined context, so as to obviate the problems related to polysemy and synonymy, and focus the search on documents relevant to the context.
SUMMARY OF THE INVENTION
This invention provides means for context-relevant document retrieval that preferentially returns items that are relevant to a user's interests. According to one aspect of the invention, it learns the semantic profiles of terms from a training body or corpus of text that is known to be relevant. A neural network is used to extract semantic profiles from the text of known relevance. A new set of documents, such as world wide web pages obtained from a keyword search of the Internet, is then submitted for processing to the same neural network, which computes a semantic profile representation for these pages using the semantic relations learned from profiling the training documents. According to another aspect of the invention, these semantic profiles can be organized into clusters, i.e., groups of points in a multidimensional space forming relatively close associations, in order to minimize the time required to answer a query. When a user queries the database, his or her query is similarly transformed into a semantic profile and compared with the semantic profiles of each cluster of documents. The query profile is then compared with each of the documents in that cluster. Documents with the closest weighted match to the query are returned as search results.


REFERENCES:
patent: 5325298 (1994-06-01), Gallant
patent: 5619709 (1997-04-01), Caid et al.
patent: 5774845 (1998-06-01), Ando et al.
patent: 6006221 (2000-06-01), Liddy et al.
patent: 6076088 (2000-06-01), Paik et al.
Carlson et al., “A cognitively-based neural network for determining paragraph coherence”, IJCNN91, pp. 1303-1308, Nov. 1991.
Scheler, “Extracting semantic features from unrestricted text”, WCNN96, p. 499, Sep. 1996.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Process and system for retrieval of documents using... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Process and system for retrieval of documents using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Process and system for retrieval of documents using... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2584847

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.