Indexed, extensible, interactive document retrieval system

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06678694

ABSTRACT:

CROSS-REFERENCE TO RELATED APPLICATIONS
Not applicable.
STATEMENT RE FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
Not applicable.
FIELD OF THE INVENTION
The present invention relates to document retrieval systems, and more particularly, to search systems suitable for locating documents on an internet or intranet.
BACKGROUND OF THE INVENTION
The earliest retrieval systems were mainframe computers that contained the full text of thousands of documents and that were accessed from time sharing terminals. The earliest such systems, developed in the early 1960's, took a list of words and linearly searched through a tape library of the documents searching directly for those that contained the specified words. By the mid to late 1960's, more sophisticated systems first developed word indices or concordances of the searchable words in the set of documents (excluding nonsearchable words such as of, the, and). The concordance contained, for each word, the document numbers of all the documents that contained the word. In some systems, this document number was accompanied by the number of times the word appeared in the corresponding document to serve as a crude measure of the relevance of each word to each document. Such systems simply required the requestor to type in a list of words, and the system then computed and assigned a relevance to each document, retrieving and displaying the documents to the requestor in relevance order. An example of such a system was the QuicLaw system developed by Hugh Lawford at Queens University in Canada with support from IBM Canada. Phrase searches on that system were done by examining the documents and scanning them for phrases after they had been retrieved, and accordingly phrase searches were slow.
Other systems, such as Mead Data Central's LEXIS system developed by Jerome Rubin and Edward Gotsman and others, included in its concordance an entry for each and every word, which included, along with the document number (of the document that contained the word), a document segment number (identifying the segment of the document in which the word appeared) and also a word position number (identifying where, within the segment, the word appeared relative to other words). West Group's WESTLAW system, developed a few years later by William Voedisch and others, improved upon this by including in the concordance entry for each word a paragraph number (indicating where the word appeared within the segment), a sentence number (indicating where the word appeared within the paragraph), and a word position number (indicating where the word appeared within the sentence). These two systems, which are still in use today, both permit the logical connectors or operators, AND, OR, AND NOT, w/seg (within the same segment), w/p (within the same paragraph), w/s (within the same sentence), w/4 (within 4 words of each other), and pre/4 (preceding by 4 words) to be used to write out formal, complex search requests. Parenthesis permit one to control the order of execution of these logical operations. Another class of systems, and in particular the Dialog system which is still in use today, grew out of the early NASA RECON system that assigned names to previously-performed searches so that those searches could be incorporated by reference into later-performed searches.
Professional librarians and legal researchers use all three of these systems regularly. However, these professionals must train for many weeks and months to learn how to formulate complex queries containing parenthesis and logical operators. Lay searchers cannot use these powerful systems with the same degree of success because they are not trained in the proper use of operators and parenthesis and do not know how to formulate search queries.
These systems also have other undesirable properties. When asked to search for multiple words and phrases conjoined by OR, these systems tend to recall far too many unwanted documents—their precision is poor. Precision can be improved by the addition of AND operators and word proximity operators to a search request, but then relevant documents tend to be missed, and accordingly the recall rate of these systems suffers.
To enable untrained searchers to use these systems, various artificial intelligence schemes have been developed which, like the early QuicLaw system, simply permit a requestor to type in a list of words or a sentence, and then produce some ranking and production of the documents. These systems produce variable results and are not particularly reliable. Some ask the requestor to select a particularly relevant document, and then, using the words which that document contains, these systems attempt to find similar documents, again with rather mixed results.
The WESTLAW system also contains some formal indexing of its documents, with each document assigned to a topic and, within each topic, to a key number that corresponds to a position within an outline of the topic. But this indexing can only be used when each document has been hand-indexed by a skilled indexer. New documents added to the WESTLAW system must also be manually indexed. Other systems provide each document with a segment or field that contains words and/or phrases that help to identify and characterize the document, but again this indexing must be done manually, and the retrieval systems treat these words and phrases in the same manner as they do other words and phrases in the document.
With the development of the Internet, web crawlers have been developed that search the web creating what amount to concordances of thousands of web pages, indexing documents by their URLs (uniform resource locators or web addresses) as well as by the words and phrases that they contain and also by index terms optionally placed into a special field of each document by the document's authors. But these search engines retrieve thousands of documents containing a word or phrase and do not assist one in sorting through all the documents that are captured. In other words, their precision is poor. And the introduction of the AND operator to these systems causes their recall to suffer.
All of these systems suffer from an even more fundamental defect: They do not teach the requestor how to search other than to the extent that the requestor accidentally encounters new words and phrases while browsing. They also do not suggest, nor automate, the application and the use of indexing to the extent that indexing is available. They do not query the requestor, offering the requestor alternative ways to proceed. They do not automatically index new documents that have not previously been indexed manually.
Accordingly, it is a primary object of the present invention to provide a document retrieval system, suitable for searching for documents on the Internet or on an intranet, that is indexed, that is extensible without additional manual indexing, and that accepts broadly formulated queries from a requestor, and that then enters into a dialogue with the requestor to refine and focus the search, using precise indexing to improve considerably the precision of searching, minimizing browse time and false hits, without suffering a corresponding reduction in the relevant document recall rate.
BRIEF SUMMARY OF THE INVENTION
Briefly summarized, the present invention is an interactive document retrieval system that is designed to search for documents after receiving a search query from a requestor. It contains a knowledge database that contains at least one data structure which relates document word patterns to topics. This knowledge database can be derived from an indexed collection of documents.
The present invention utilizes a query processor that, in response to the receipt of a search query from a requestor, searches for and tries to capture documents containing at least one term that is related to the search query. If any documents are captured, the processor analyzes the captured documents to determine their word patterns, and it then categorizes the captured documents by comparing each document's word pattern to

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Indexed, extensible, interactive document retrieval system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Indexed, extensible, interactive document retrieval system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Indexed, extensible, interactive document retrieval system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3218390

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.