Information retrieval by natural language querying

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06601026

ABSTRACT:

COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND
The invention relates to information retrieval by natural language querying.
The World-Wide-Web (Web) is a relatively new publishing medium where a vast number of documents can be retrieved and viewed by anyone with access to the Internet. By endowing individuals, companies and organizations with the ability to publish and retrieve information conveniently and inexpensively, the Web has become the gateway to a plethora of information. Its success as an information distribution and retrieval system has resulted in a vast sea of information on the Web.
This information explosion has undermined the Web's utility as an information source. To assist overwhelmed users in locating and retrieving specific useful information from the Web, a variety of search engines have been developed. Typically, a search engine accepts one or more keywords from a user, performs a search for documents containing the keywords and returns links to documents containing the keywords for the user to review. Although traditional search engines are capable of supporting highly specific search queries using one or more command sequences, users typically default to entering two or three keywords into the search engine as queries because they are not comfortable with the intricate format associated with the command sequences.
Typically search engines use Boolean search techniques, which rely on the presence of each keyword. The Boolean search approach is fast and works well for certain applications that have precise search terminologies (such as legal and medical applications). Other search techniques such as vector space and neural network search techniques apply more sophisticated comparisons involving joint usage of terms within documents. These techniques are powerful for automatically grouping documents by their likely topic area (document clustering).
Web-search engines generally scan the Web and generate a substantial index that can be subsequently searched in response to a user's query. In order to support a relatively complete search over a collection of documents, the derived document collection index may store a list of the terms, or individual words, that occur within the indexed document collection. Words, particularly simple verbs, conjunctions and prepositions, are often preemptively excluded from the term index as presumptively carrying no informationally significant weight. Various heuristics can be employed to identify other words that appear frequently within the document collection and which contextually differentiate documents in the collection.
These search engines can also compute a relevancy score based on the combined frequency of occurrence of the query terms for each document. Such an approach presumes that increasing occurrences of specific query terms within a document means that the document is more likely to be relevant and responsive to the query. A query report listing the identified documents ranked according to relevancy score is then presented to the user. The report listing can be voluminous and can require the user to sift through numerous documents to locate particular documents of interest.
An increasing amount of Web content is evolving from text-based documents to multimedia documents which include video clips and sound files. This is due in part to the fact that certain perishable and high value-added information such as news on business, sports, current events and entertainment is best presented in audio-visual form and multimedia form rather than text form. Examples of sources of audio-visual/multimedia information include television feeds, cable feeds, radio feeds, and computer generated multimedia feeds. Text-based search engines typically cannot search these multimedia sources of information.
SUMMARY
A natural language information querying system includes an indexing facility configured to automatically generate indices of dynamically updated text sources based on a predefined grammar and a database coupled to the indexing facility to store the indices.
Implementations of the invention include a query engine coupled to the database to respond to a natural language query.
In another aspect, a method for providing information in response to a natural language query, includes extracting information from an updated text corpus based on a predefined grammar; and creating a stored indexed text corpus adapted to permit natural language querying.
Implementations of the above aspect include one or more of the following. The method includes searching the stored index for the text corpus based on the natural language query. The information extracting step includes creating templates associated with one or more events and relationships associated with a topic. The method can update the index by applying a speech recognizer to a multimedia stream. The method also includes creating a summary for each document in a group of documents; quoting a relevant portion of each located document in a summary; or annotating the output by group in a summary. Where the stored index for the text corpus resides on a server, the method further includes sending the natural language query from a mobile device such as a handheld computer; and receiving a natural language response from the server and forwarding the response to a user. The response can be converted to speech using a text-to-speech unit. The natural language query can be captured using a speech recognizer or a handwriting recognizer. The query and the text corpus can relate to locating competitive intelligence information, litigation support information, products on-line, medical information, legal information, electronic commerce information, educational information, financial information, investment information, or information for a vertical market application, among others.
In another aspect, a system for providing information in response to a natural language query includes an information extraction engine adapted to index an automatically updated text corpus based on a predefined grammar; a database coupled to the information extraction engine to store the index output; and a natural language query engine coupled to the database to search the index in response to the natural language query.
Implementations of the above aspect include one or more of the following. A data acquisition unit can be coupled to the information extraction engine to automatically update the text corpus. The data acquisition unit can receive data from any of the following in any combination: a web crawler, a news service, or a search engine, for example. The grammar can be based on events and relationships associated with a topic. The grammar can comprise pattern-action rules, or it can comprise one or more rules to specify a proper noun, a complex word, a phrase, as well as a domain event. The grammar can also comprise one or more rules for merging partial information from different parts of a document. The index for the text corpus can be searched using natural language querying. The natural language querying can be based on a query grammar. The query grammar can be associated with a topic. The query grammar can be represented as pattern-action rules. A query reply generator can be coupled to the natural language query engine to format the output of the search. The query reply generator can create a summary of the output for each document in a group of documents. The query reply generator can quote a relevant portion of each located document in a summary or can annotate the output by group in a summary. The query reply generator can also highlight a relevant portion in each located document. A network, such as the Internet, can be coupled to the natural l

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Information retrieval by natural language querying does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Information retrieval by natural language querying, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information retrieval by natural language querying will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3076216

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.