Real time structured summary search engine

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06397209

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to a method of processing data, and more particularly to a method of processing stored electronic documents to facilitate subsequent retrieval.
BACKGROUND OF THE INVENTION
It is known to search text-based documents electronically using keywords linked through Boolean logic. This technique has been used for many years to search patent literature, for example, and more recently documents on the Internet. The problem with such conventional searches is that if the search criteria are made broad, the search engine will often produce thousands of “hits”, many of which are of no interest to the searcher. If the criteria are made too narrow, there is a risk that relevant documents will be missed.
There is a real need to provide a search engine that will filter out unwanted results while retaining results of interest to the user. An object of the invention is to provide such a system.
SUMMARY OF THE INVENTION
According to the present invention there is provided a method of processing electronic documents for subsequent retrieval, comprising the steps of storing in memory a summary structure database describing the structure of summary records associated with each document, each structured summary record having at least one descriptor field with predefined allowed entries identifying a characteristic of the document; storing in memory predefined keyword criteria associated with said allowed field entries; analyzing each document to build a text index listing the occurrence of unique significant words in the document; and comparing said text index with said keyword criteria to determine the appropriate field entry for the associated descriptor field.
Examples of descriptor fields with limited allowed field entries are category and location. The category field might have as possible field entries: Finance, Sports, Politics. The location field might have as possible entries: Africa, Canada, Europe.
The individual field entries are in turn associated with certain keyword criteria. For example, the criteria for the financial field entry. might be: shares, public, bankrupt, market, profit, investor, stock, IPO, quarter, “fund manager”. The criteria for the sports field entry might be: football, ball, basketball, hockey, bat, score, soccer, run, baseball, “Wayne Gretsky”, “Chicago Bulls”, “Michael Jordan”.
It will be appreciated that the keyword criteria are chosen in view of the likelihood that any document containing those keywords will be associated with the particular category.
In a preferred embodiment, the structured summary also includes fields having unlimited entries. Examples of such fields are a keyword field and an excerpt field. The keyword field may list the words having the highest count in the text index. The excerpt field may list the sentences containing the highest occurrence of keywords.
The structured summary can be established according to a standard profile that is the same for all users, or in one embodiment the profile can change in accordance with a particular user's need. In this case, a user profile is stored in a profile database.
The structured summaries normally include pointers to the memory locations of the associated documents so that during a subsequent search, a user view relevant summaries and quickly locate the associated document as required.
The invention also extends to a system for processing electronic documents for subsequent retrieval, comprising a memory storing a summary structure describing the structure of summary records associated with each document, each structured summary record having at least one descriptor field with predefined allowed entries identifying a characteristic of the document; a memory storing predetermined keyword criteria associated with said allowed field entries; means for analyzing each document to build a text index listing the occurrence of unique significant words in the document; and means for comparing said text index with said keyword criteria to determine the appropriate field entry for the associated descriptor field.
The invention still further provides a method of retrieving electronic documents which are associated with a structured summary record containing a pointer to the document and having at least one descriptor field representative with predefined allowed field entries identifying a characteristic of the document, comprising searching through the summary records for records having specific field entries, and identifying the documents associated with the records matching the search criteria.


REFERENCES:
patent: 5050071 (1991-09-01), Harris et al.
patent: 5191525 (1993-03-01), LeBrun et al.
patent: 5257365 (1993-10-01), Powers et al.
patent: 5519855 (1996-05-01), Neeman et al.
patent: 5710844 (1998-01-01), Capps et al.
patent: 5818955 (1998-10-01), Smithies et al.
patent: 5845278 (1998-12-01), Kirsch et al.
patent: 6009442 (1999-12-01), Chen et al.
patent: 6122643 (2000-09-01), Paik et al.
patent: 6182066 (2001-01-01), Marques
patent: 9623265 (1996-08-01), None

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Real time structured summary search engine does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Real time structured summary search engine, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Real time structured summary search engine will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2829350

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.