Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-10-01
2002-03-26
Alam, Hosain T. (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06363373
ABSTRACT:
TECHNICAL FIELD
This invention generally relates to database search engines for computer systems. More particularly, this invention relates to concept searching using a Boolean or keyword search engine.
BACKGROUND OF THE INVENTION
Database search engines permit users to perform queries on a set of documents by submitting search terms. Users must typically submit one or more search terms to the search engine in a format specified by the search engine. Most search engines specify that search terms should be submitted as a Boolean or keyword search query (i.e. “red OR green” or “blue AND black”). Boolean or keyword search queries can become extremely complex as the user adds more search terms and Boolean operators. Moreover, most search engines have complex syntax rules regarding how a Boolean or keyword search query must be constructed. For users to get accurate search results, therefore, they must remember the appropriate syntax rules and apply them in an effective manner. This process can be difficult for many users and, unless mastered, may result in searches which return irrelevant documents.
“Natural language” search engines have been developed which permit users to submit a natural language query to the search engine rather than just keywords. For instance, a user may input the simple natural language sentence “How do I fix my car?” instead of the more complex Boolean search query “how AND to AND fix AND car?” Instead of searching for just the keywords contained in the search query, a typical natural language search engine will extract the concepts implied by the query and search the database for documents referencing the concepts. A natural language search engine will therefore return documents from its database which contain the concepts contained in the search query even if the documents do not contain the exact words in the search query. A natural language search query may be submitted to a Boolean or keyword search engine. However, these types of search engines will only return documents containing the exact words in the search query.
Although natural language search engines provide the benefits of easy to understand natural language search queries and concept searching, natural language search engines are not without their drawbacks. For example, natural language search engines are considerably more expensive to develop than a Boolean or keyword search engine. Moreover, natural language search engines can be difficult and expensive to implement, especially where they are used to replace existing Boolean or keyword search engines.
Therefore, there is a need for a method and apparatus for database searching which (1) permits effective searching using a Boolean or keyword search engine with natural language search queries, (2) which permits concept searching using a Boolean or keyword search engine, and (3) which may be implemented without any modification to the Boolean or keyword search engine.
SUMMARY OF THE PRESENT INVENTION
The present invention satisfies the above-described needs by providing a method and apparatus for concept searching using a Boolean or keyword search engine. Using the method and apparatus of the exemplary embodiment, documents are preprocessed before being passed to the search engine for inclusion in the search engine's database. Search queries are also preprocessed before being passed to the search engine.
With regard to the preprocessing of documents, each document is scanned on a word-by-word basis to identify the “word tokens” contained in the document. Word tokens are actual words or word-like strings such as dates, numbers, etc. Once the word tokens in a document have been extracted, each word token is located in a “concept database” that maps word tokens to concept identifiers. Each word token may map to zero or more concept identifiers.
Once the concept identifiers associated with each word token have been extracted from the concept database, a consolidated list of concept identifiers is created. Each of the concept identifiers in the list is then converted into a unique non-word concept token which identifies the concept. A concept token is a non-word character string which identifies and is mapped to a concept. For instance, the concept token “Q
1
A
5
” may map to the concept of “color.” These concept tokens are then arranged into a list.
Once the list of concept tokens has been created, the tokens are inserted into the document. In an exemplary embodiment, a hypertext markup language (“HTML”) META tag is used to insert the concept tokens into the document. Using the HTML META tag, the concept tokens are treated as ordinary text by the search engine and therefore may be searched, but are invisible to the user. The document is then transferred to the server monitored by the search engine. All documents indexed by the search engine are preprocessed in this manner.
With regard to the preprocessing of search queries, an additional component is interposed between the query submitted by the user and the search engine. This component preprocesses the query in much the same way as document preprocessing described above, and then sends a modified query to the search engine.
Queries are preprocessed by first breaking the search terms into word tokens. The word tokens are then referenced in the concept database (the same database used for document preprocessing) and any associated concept identifiers are retrieved. The concept identifiers are then converted to unique concept tokens as described above and are combined into a string with separating spaces. Text is prepended to the string to instruct the search engine to search the contents of all documents' META tags for the tokens. This string constitutes the preprocessed query which is then sent to the search engine.
The unmodified Boolean or keyword search engine then finds all of the documents whose concept tokens most closely match the concept tokens in the modified query. The preprocessing of both documents and queries is transparent to the search engine. However, the exemplary embodiment of the present invention described herein solves all of the above-described problems by modifying the built-in functionality of the Boolean or keyword search engine to search for concepts rather than keywords.
Therefore, it is an object of the present invention to provide a method and apparatus for database searching which permits effective searching using a Boolean or keyword search engine with natural language search queries.
It is also an object of the present invention to provide a method and apparatus for database searching which permits concept searching using a Boolean or keyword search engine.
It is a further object of the present invention to provide a method and apparatus for natural language and concept searching using a Boolean or keyword search engine which may be implemented without any modification to the Boolean or keyword search engine.
That the present invention and the exemplary embodiments thereof overcome the problems and drawbacks set forth above and accomplish the objects of the invention set forth herein will become apparent from the detailed description of exemplary embodiments which follows.
REFERENCES:
patent: 5675819 (1997-10-01), Schuetze
patent: 5724571 (1998-03-01), Woods
patent: 6006221 (1999-12-01), Liddy et al.
patent: 6026388 (2000-02-01), Liddy et al.
patent: 6038560 (2000-03-01), Wical
patent: 6076088 (2000-06-01), Paik et al.
patent: 6094652 (2000-07-01), Faisal
patent: 6094657 (2000-07-01), Hailpern et al.
Salton, Gerard.Automatic Text Processing. Addison-Wesley Publishing Company, MA. Pp. 313-319. 1989.
Alam Hosain T.
Kindred Alford W.
Merchant & Gould
Microsoft Corporation
LandOfFree
Method and apparatus for concept searching using a Boolean... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for concept searching using a Boolean..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for concept searching using a Boolean... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2873378