Search engine with natural language-based robust parsing for...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C704S009000

Reexamination Certificate

active

06766320

ABSTRACT:

TECHNICAL FIELD
This invention relates to search engines and other information retrieval tools.
BACKGROUND
With the explosive growth of information on the World Wide Web, there is an acute need for search engine technology to keep pace with users' need for searching speed and precision. Today's popular search engines, such as “Yahoo!” and “MSN.com”, are used by millions of users each day to find information. Unfortunately, the basic search method has remained essentially the same as the first search engine introduced years ago.
Search engines have undergone two main evolutions. The first evolution produced keyword-based search engines. The majority of search engines on the Web today (e.g., Yahoo! and MSN.com) rely mainly on keyword searching. These engines accept a keyword-based query from a user and search in one or more index databases. For instance, a user interested in Chinese restaurants in Seattle may type in “Seattle, Chinese, Restaurants” or a short phrase “Chinese restaurants in Seattle”.
Keyword-based search engines interpret the user query by focusing only on identifiable keywords (e.g., “restaurant”, “Chinese”, and “Seattle”). Because of its simplicity, the keyword-based search engines can produce unsatisfactory search results, often returning many irrelevant documents (e.g., documents on the Seattle area or restaurants in general). In some cases, the engines return millions of documents in response to a simple keyword query, which often makes it impossible for a user to find the needed information.
This poor performance is primarily attributable to the ineffectiveness of simple keywords being capable of capturing and understanding complex search semantics a user wishes to express in the query. Keyword-based search engines simply interpret the user query without ascribing any intelligence to the form and expression entered by the user.
In response to this problem of keyword-based engines, a second generation of search engines evolved to go beyond simple keywords. The second-generation search engines attempt to characterize the user's query in terms of predefined frequently asked questions (FAQs), which are manually indexed from user logs along with corresponding answers. One key characteristic of FAQ searches is that they take advantage of the fact that commonly asked questions are much fewer than total number of questions, and thus can be manually entered. By using user logs, they can compute which questions are most commonly asked. With these search engines, one level of indirection is added by asking the user to confirm one or more rephrased questions in order to find an answer. A prime example of a FAQ-based search engine is the engine employed at the Web site “Askjeeves.com”.
Continuing our example to locate a Chinese restaurant in Seattle, suppose a user at the “Askjeeves.com” site enters the following search query:
“What Chinese restaurants are in Seattle?”
In response to this query, the search engine at the site rephrases the question as one or more FAQs, as follows:
How can I find a restaurant in Seattle?
How can I find a yellow pages listing for restaurants in Seattle, Wash.?
Where can I find tourist information for Seattle?
Where can I find geographical resources from Britannica.com on Seattle?
Where can I find the official Web site for the city of Seattle?
How can I book a hotel in Seattle?
If any of these rephrased questions accurately reflect the user's intention, the user is asked to confirm the rephrased question to continue the searching process. Results from the confirmed question are then presented.
An advantage of this style of interaction and cataloging is much higher precision. Whereas the keyword-based search engines might return thousands of results, the FAQ-based search engine often yields a few very precise results as answers. It is plausible that this style of FAQ-based search engines will enjoy remarkable success in limited domain applications, such as web-based technical support.
However, the FAQ-based search engines are also limited in their understanding the user's query, because they only look up frequently occurring words in the query, and do not perform any deeper syntactic or semantic analysis. In the above example, the search engine still experiences difficulty locating “Chinese restaurants”, as exemplified by the omission of the modifier “Chinese” in any of the rephrased questions. While FAQ-based second-generation search engines have improved search precision, there remains a need for further improvement in search engines.
Another problem with existing search engines is that most people are dissatisfied with the user interface (UI). The chief complaint is that the UI is not designed to allow people to express their intention. Users often browse the Internet with the desire to obtain useful information. For the keywords-based search engine, there are mainly two problems that hinder the discovery of user intention. First, it is not so easy for users to express their intention by simple keywords. Second, keyword-based search engines often return too many results unrelated to the users' intention. For example, a user may want to get travel information about Beijing. Entering ‘travel’ as a keyword query in Yahoo, for example, a user is given 289 categories and 17925 sites and the travel information about Beijing is nowhere in the first 100 items.
Existing FAQ-based search engines offer UIs that allow entry of pseudo natural language queries to search for information. However, the underlying engine does not try to understand the semantics of the query or users' intention. Indeed, the user's intention and the actual query are sometimes different.
Accordingly, there is a further need to improve the user interface of search engines to better capture the user's intention as a way to provide higher quality search results.
SUMMARY
A search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches. The search engine architecture includes a natural language parser that parses a user query and extracts syntactic and semantic information. The parser is robust in the sense that it not only returns fully-parsed results (e.g., a parse tree), but is also capable of returning partially-parsed fragments in those cases where more accurate or descriptive information in the user query is unavailable. This is particularly beneficial in comparison to previous efforts that utilized full parsers (i.e., not robust parsers) in information retrieval. Whereas full parsers tended to fail on many reasonable sentences that were not strictly grammatical, the search engine architecture described herein always returns the best fully-parsed or partially-parsed interpretation possible.
The search engine architecture has a question matcher to match the fully-parsed output and the partially-parsed fragments to a set of frequently asked questions (FAQs) stored in a database. The question matcher correlates the questions with a group of possible answers arranged in standard templates that represent possible solutions to the user query.
The search engine architecture also has a keyword searcher to locate other possible answers by searching on any keywords returned from the parser. The search engine may be configured to search content in databases or on the Web to return possible answers.
The search engine architecture includes a user interface to facilitate entry of a natural language query and to present the answers returned from the question matcher and the keyword searcher. The user is asked to confirm which answer best represents his/her intentions when entering the initial search query.
The search engine architecture logs the queries, the answers returned to the user, and the user's confirmation feedback in a log database. The search engine has a log analyzer to evaluate the log database and glean information that improves performance of the search engine over time. For instance, the search

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Search engine with natural language-based robust parsing for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Search engine with natural language-based robust parsing for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Search engine with natural language-based robust parsing for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3252515

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.