Free format query processing in an information search and...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C706S052000, C345S215000

Reexamination Certificate

active

06405190

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed toward the field of information search and retrieval systems, and more particularly to processing a free format query to determine conjunctives between terms of the free format search query.
2. Art Background
An information retrieval system attempts to match user queries (i.e., the users statement of information needs) to locate information available to the system. In general, the effectiveness of information retrieval systems may be evaluated in terms of many different criteria including execution efficiency, storage efficiency, retrieval effectiveness, etc. The most common measures used are “recall” and “precision.” Recall is defined as the ratio of relevant documents retrieved for a given query over the number of relevant documents for that query available in the repository of information. Precision is defined as the ratio of the number of relevant documents retrieved over the total number of documents retrieved. Both recall and precision are measured with values ranging between zero and one. An ideal information retrieval system has both recall and precision values equal to one.
A user, in order to locate information in the system, specifies the type of information sought in the form of a query. The query consists of one or more terms that the user believes best expresses the information sought. For example, if a user seeks information regarding “the outbreak of Hepatitis in North America”, the user may formulate the query “Hepatitis and North America.” Typically, a formal query language requires the user to express the idea regarding the information sought within rigid parameters. A formal query language, such as the standard query language (SQL), sets forth parameters and format requirements for the query. For example, typically, information retrieval systems use Boolean operators to specify the connections between two or more words in the query. If the user inputs the query “Hepatitis and North America” and the word “and” is interpreted as a Boolean AND operation, then the information retrieval system retrieves all documents that contain subject matter on both Hepatitis and North America. Using a formal query language permits a one-to-one correspondence between the user's expression of the query and the interpretation of the query by the information retrieval system. Although formal query languages reduce the ambiguity between the user's expression and the interpretation by the system, they are rigid and require the user to learn the semantics of the query language. Accordingly, it is desirable to permit a user of an information retrieval system to submit a query in any form.
Generally, a query that expresses an idea to retrieve information that is not in a specific query language format is known as a free format query. Free format queries are also known as natural language queries. Free format queries contain an expression in a form of general human discourse. The user intends the query to be interpreted through a contextual semantic interpretation in the same way words are interpreted in general human discourse. For example, in a free format query, the user may formulate the query “Hepatitis in North America” to locate documents including subject matter on both Hepatitis and North America. In normal human discourse, although the word “and” was not used, it is apparent that the user seeks information on subject matter containing both Hepatitis AND North America. Even when “and conjunctions” and “or conjunctions” are used, humans do not use them in the same way as programming languages. For example, the user may express, as a free format query, the expression “red and green balls.” This query example introduces an ambiguity as to whether the user seeks information regarding “red balls and green balls” or whether the user seeks information on balls that are both red and green. As is explained fully below, the present invention provides a link to bridge the gap between a contextual semantic interpretation of a free format query and a computer programming language interpretation of a formal query language.
SUMMARY OF THE INVENTION
A search and retrieval system pre-processes an input query to map a contextual semantic interpretation, expressed by the user of the input query, to a boolean logic interpretation for processing in the search and retrieval system. The search and retrieval system includes a knowledge base that comprises a plurality of categories. Subsets of the categories are designated to one of a plurality of groups. In one embodiment, the groups are based on dimensional categories, such that each dimensional category represents a discrete and independent concept from other dimensional categories in the knowledge base.
To pre-process the query, the search and retrieval system receives an input query comprising a plurality of terns, and processes the terms of the input query to identify value terms that comprise a content carrying capacity. The knowledge base is referenced to identify a group for each value term. A processed input query is generated by inserting an AND logical connector between two value terms if the two respective value terms are in different groups and by inserting an OR logical connector between two value terms if the two respective value terms are in the same group.
In one embodiment, the search and retrieval system includes a lexicon. The lexicon stores a plurality of terms and phrases, including information about the terms. During input query pre-processing, the lexicon is referenced to identify query terms as one of the phrases stored. If found, the phrase is processed as a single value term. In another embodiment, the user input query is processed to replace, where appropriate, prepositions and conjunctions. For this embodiment, the lexicon also identifies terms as AND preposition terms, AND conjunction terms, OR conjunction terms, and NOT conjunction terms. During query pre-processing, the lexicon is referenced to identify an input query term as an AND preposition term, an AND conjunction term, or an OR conjunction term. An AND logical boolean connector is generated in lieu of the input query term if an input query term comprises an AND preposition term, and an OR logical boolean connector is generated in lieu of the input query term if an input query term comprises an OR conjunction term. An AND logical boolean connector is generated in lieu of the input query term if an input query term comprises an AND conjunction term, and a NOT logical boolean connector is generated in lieu of the input query term if an input query term comprises a NOT conjunction term.


REFERENCES:
patent: 4649515 (1987-03-01), Thompon et al.
patent: 5119318 (1992-06-01), Paradies et al.
patent: 5239663 (1993-08-01), Faudemay et al.
patent: 5504887 (1996-04-01), Malhotra et al.
patent: 5619709 (1997-04-01), Caid et al.
patent: 5657450 (1997-08-01), Rao et al.
patent: 5659724 (1997-08-01), Borgida et al.
patent: 5806060 (1998-09-01), Borgida et al.
patent: 5870740 (1999-02-01), Rose et al.
patent: 6023695 (2000-02-01), Osborn et al.
patent: 6038560 (2000-03-01), Wical
patent: 6094652 (2000-07-01), Faisal
patent: 6101515 (2000-08-01), Wical et al.
patent: 6112168 (2000-08-01), Corston et al.
patent: 6144953 (2000-11-01), Sorrells et al.
patent: 6154213 (2000-11-01), Rennison et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Free format query processing in an information search and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Free format query processing in an information search and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Free format query processing in an information search and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2928295

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.