Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine
Reexamination Certificate
2001-06-29
2004-05-25
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Linguistics
Translation machine
C704S009000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06741959
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to the field of information retrieval based on user input and more particularly to a system and method of retrieving, evaluating, and ranking data from a data set based upon a natural language query from the user.
BACKGROUND OF THE INVENTION
Search engines are known in the art and are used for retrieving information from databases based on user supplied input. However, for large information systems, current search tools fail to provide adequate solutions to the more complex problems which users often face. For example, many known search engines restrict users to providing key search terms which may be connected by logical connectors such as “and,”
0
“or,” and “not.” This is often inadequate for complex user problems which are best expressed in a natural language domain. Using only keywords or boolean operations often results in a failure of the search engine to recognize the proper context of the search. This may lead to the retrieval of a large amount of information from the database that is often not very closely related to the user problem. Because known search engines do not sufficiently process the complexity of user input, it is often the case that it is very difficult with current online help facilities to obtain relevant helpful documentation for a given complex problem.
Another problem with current technologies used in document retrieval is that the user may find certain documents retrieved in a search more valuable than others, however, the user is not able to explicitly express a criteria of importance. It is often difficult in dealing with complex contexts to specify a relevance criteria exactly and explicitly.
OBJECTS AND SUMMARY OF THE INVENTION
It is an object of the present invention to provide a search machine employing a generic or non-context specific approach for a tool that is capable of searching for information contained in documents of a database on the basis of a problem specification entirely stated in natural language.
It is a further object of the present invention to provide a search machine that is not restricted to a specific environment, such as database retrieval, but may also be used in various contexts, such as, for example, context-sensitive online help in complex working and information environments, retrieval of relevant information in tutor and advisory systems, decision support for the organization of information databases, and information agents which search to build up, organize and maintain new information databases.
A further object of the present invention is to provide a search machine that can locate the most relevant parts of text within a document. Thus, the search machine may present the most relevant part of the document to the user, or provide necessary information to indicate to the user which part of the document is the most relevant.
It is a further object of the present invention to provide a search machine that attaches significance values to words or word stems of a database and a query and uses the significance values to compare the query to the database.
Yet another object of the present invention is to provide a search machine that uses data relating to the frequency of occurrence of words or word stems in a document to determine a documents relevance to a query.
In the present invention, a search machine is provided that can accept a query that may be stated in the form of natural language. The search machine can reduce the natural language query into a vector of word stems and also can reduce the documents to be searched into vectors of word stems, where a word stem is a word or part of a word from which various forms of a word are derived. The search machine may analyze the vectors of word stems determining such factors as the frequency with which word stems occur in the query and database documents, the significance of the word stems appearing in the documents, and other comparison information between the query vector and the database document vectors, in order to determine the suitability of a database document to serve as a solution to the query.
The search machine according to the present invention for retrieving information from a database based on a query of a user may include a lexicon generator for deriving a lexicon database of word stems from the documents of the database and from the query. It may further include an evaluation component that includes a document vectorizer for creating representation vectors for the documents of the database and a query representation vector for the query using the lexicon database. The document representation vectors contain data on the word stems located in the database and the query representation vector contains data on the word stems located in the query. The evaluation component may further include a vector rule base, and a vector evaluator. The vector evaluator may derive a relevance value for a document representation vector relative to the query representation vector according to the vector rule base and output information from the database that relates to the query. The search machine may also include a fine-tuner for modifying the vector rule base. The user can provide external feedback about the information retrieved from the database, and the fine tuner may use that feedback to modify the vector rule base.
The present invention is also directed to a method of retrieving documents from a database corresponding to a query that includes the steps of (i) deriving a lexical database of word stems from the documents of the database and the query, (ii) creating a representation vector corresponding to each document of the database and a query representation vector corresponding to the query, each representation vector containing information about the word stems of the lexical database that are contained in the document to which the representation vector corresponds, the query representation vector containing information about the word stems that are contained in the query document, (iii) evaluating each representation vector relative to the query representation vector using vector evaluation rules, for example, evaluating the similarity of the elements contained in vectors, (iv) creating output reflecting the evaluation of the representation vectors; and (v) presenting the output.
REFERENCES:
patent: 5239617 (1993-08-01), Gardner et al.
patent: 5255386 (1993-10-01), Prager
patent: 5325298 (1994-06-01), Gallant
patent: 5675819 (1997-10-01), Schuetze
patent: 5778357 (1998-07-01), Kolton et al.
patent: 5873056 (1999-02-01), Liddy et al.
patent: 5963940 (1999-10-01), Liddy et al.
patent: 6236768 (2001-05-01), Rhodes et al.
patent: 6341282 (2002-01-01), Sharpe et al.
patent: 690 22 842 (1996-05-01), None
patent: 0 441 089 (1991-08-01), None
patent: 0 522 591 (1993-01-01), None
patent: WO 97/08604 (1997-03-01), None
Lennon et al., “An evaluation of some conflation algorithms for information retrieval,” 1981 Journal of Information Science, vol. 3. pp. 117-183.*
Russell et al., “Artificial Intelligence: A Modern Approach,” 1995, Prentice Hall, pp. 111-115.*
Buckley et al., “New Retrieval Approaches Using SMART: TREC 4,” 1996, published in the Fourth Text Retrieval Conference (TREC-4), pp. 1-3.*
Hull, “Stemming Algorithms: A Case Study for Detailed Evaluation,” Jan. 1996, Journal of the American Society for Information Science, pp. 70-84.*
Lee et al., “Document Ranking and the Vector-space Model,” Mar./Apr. 1997, IEEE Software, pp. 67-75.
Dorvil Richemond
Finnegan Henderson Farabow Garrett & Dunner LLP
Harper V. Paul
SAP Aktiengesellschaft
LandOfFree
System and method to retrieving information with natural... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method to retrieving information with natural..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method to retrieving information with natural... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3207011