Natural language information retrieval system

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06393428

ABSTRACT:

TECHNICAL FIELD
This invention relates generally to the field of computer software and, more particularly, to a natural language information retrieval system employing a hash table technique to reduce memory requirements, a proxy process module to improve processing speed on multi-processor computing platforms, and a debugging module that is not shipped along with the natural language information retrieval system.
BACKGROUND OF THE INVENTION
The number and size of electronic documents increases continually. Any computer user with access to the Internet can search a vast universe of documents addressing every conceivable topic. Computer users may also search many other sources of electronic documents, such as dial-in databases, CD-ROM libraries, files stored on hard drives, files stored on tape drives, files stored on resources connected through an intranet, and the like. Although the available universe of documents may contain a wealth of information on a wide variety of subjects, searching through this universe to identify a small subset of documents that are relevant to a specific inquiry can be a daunting task. In fact, finding a large supply of searchable electronic documents may often be a far easier task than searching the individual documents to find information that is germane to a particular inquiry.
As a result, computer users have a continuing need for effective tools for searching the large and increasing supply of electronic documents. For example, key-word text search engines allow a computer user to identify documents that contain selected key words. More advanced search engines allow the user to further refine search requests using Boolean logic by limiting the number of words between key words, automatically searching for variations of key words, specifying searches using Boolean logical operations, and so forth. These conventional key-word text search engines have limited utility, however, because simply searching for the presence of key words using Boolean logical operations often identifies a large number of candidate documents. The user must then examine each candidate document to identify those that are actually germane to the user's inquiry. This type of document-by-document examination can be tedious and time consuming.
Natural language information retrieval (NLIR) systems have been developed to improve over Boolean-logic key-word search engines. Rather than requiring a Boolean key-word search definition, an NLIR system accepts a natural language or “plain English” question. The NLIR system automatically identifies key words in the question and important semantic relationships between the key words. For example, the NLIR system may analyze the question and identify semantic relationships within the question, such as a verb and the subject and/or object of that verb. The NLIR system then searches the universe of documents to identify those documents in which the same key words appear in the same semantic relationships.
These semantic relationships are typically identified by breaking sentences down into semantic relationships, such as logical-form triples (LFTs). An LFT includes two words from a sentence and a qualifier representing the semantic relationship between the words. For example, a user may enter the natural language question, “Do elephants have tusks?” For this question, the noun “elephant” is in a deep subject relationship (qualifier “Dsub”) with the verb “have,” and the noun “tusks” is in a deep object relationship (qualifier “Dobj”) with the verb “have.” Thus, the question “Do elephants have tusks?” can be broken down into two LFTs, “elephant-Dsub-have” and “tusk-Dobj-have.”
The NLIR system then searches the universe of documents for files containing the same LFTs. For example, the sentence, “African elephants, which have been hunted for decades, have large tusks,” also includes the LFTs, elephant-Dsub-have” and “tusk-Dobj-have.” Thus, the NLIR system would identify a document containing this sentence as a document having a high likelihood of containing an answer to the natural language question, “Do elephants have tusks?” This type of semantic-qualified searching can greatly increase the quality of information retrieval. In other words, NLIR techniques can greatly increase the likelihood that a search engine will identify documents that contain an answer to a specific inquiry. NLIR systems that accept natural language rather than Boolean search requests are also easier to use in many situations because computer users are often more familiar with stating inquiries in plain English, as opposed to formulating inquiries in a Boolean-logic format.
Conventional NLIR systems encounter drawbacks, however, because each document in the universe of searchable documents must be analyzed to identify the LFTs present in the document. Performing LFT analysis “on the fly” for a large universe of searchable documents would be prohibitively time consuming. Moreover, the same LFT processing would have to be performed multiple times for the same document. That is, LFTs would have to be identified for the same document for each natural language question processed in connection with that document. For this reason, LFT processing is typically performed only once for a particular document, and the LFTs present in the document are stored in association with the document. Preprocessing a document to identify LFTs and thus make the document amenable to subsequent NLIR analysis is sometimes referred to as “indexing” the document.
Indexing a large number of documents, such as all of the documents present on an electronic database or network, can be very time consuming. Fortunately, powerful techniques have been developed for handling such large-scale data processing tasks. These techniques include, among others, using multi-processor computer systems and multi-tasking operating systems that perform background processing. But conventional NLIR systems are not presently configured to take full advantage of these techniques because conventional NLIR systems rely heavily on global variables that prevent the NLIR system from running multiple processing threads simultaneously. The inability to simultaneously run multiple processing threads typically prevents the NLIR system from operating on more than one processor simultaneously, which undermines a major advantage of conducting the processing on a multi-processor computer system.
In addition, storing a complete set of LFTs for each document for a large number of documents can require a large amount of data storage space. In fact, it is not unusual for a complete set of LFTs to require as much storage space as the document itself. Thus, storing a complete set of LFTs for a large number of indexed documents may require a prohibitively large memory allocation for a storage-space limited program module, such as an electronic encyclopedia sold on CD-ROM. For example, the designers of an electronic encyclopedia program module may not be willing to reduce the number of documents by one-half in order to make the remaining documents amenable to NLIR processing. In addition, compressing the LFT data to reduce the memory requirement may result in prohibitively slow processing, as each LFT file would have to be uncompressed during question processing.
As a result, the desire to implement NLIR systems in connection with storage-space limited program modules presents a familiar conundrum in software development, in which acceptable processing speed cannot be achieved given acceptable memory requirements. Those techniques presently available for improving processing speed do so at the cost of increased memory requirements, and those techniques available for decreasing memory requirements do so at the cost of decreased processing speed (i.e., increased processing overhead). There is no solution presently available to provide the combination of acceptable processing speed and acceptable memory requirements for certain storage-space limited program modules, such as electronic encyclopedias and the like. For this reas

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Natural language information retrieval system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Natural language information retrieval system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Natural language information retrieval system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2884382

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.