Brilliant query system

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06546386

ABSTRACT:

FIELD OF THE INVENTION
The invention relates to providing an improved system for conducting queries over the Internet, (the Net), and private Intranets.
BACKGROUND OF THE INVENTION
The Internet delivers trillions of words to billions of screens. The Net contains an enormous amount of material. Often when conducting searches on the Net one receives far too many search results, most of which are not relevant. One of the simplest ways to obtain more focused searches is to use a Boolean ANDed search with search engines such as, Yahoo, Excite, Google and Alta Vista. Instead of using one word, two words are used and much irrelevant material is discarded.
For example, searching on the word blackbird might yield tens of thousands of hits, with results ranging from rock bands, to birds, to consulting firms, to airplanes. Search on blackbird AND reconnaissance and the number of hits can be reduced by two orders of magnitude (from 6,000 to 60 hits) and almost all of the referenced Web sites (the “hits”) deal with the famous SR-71 spy plane (known as the “Blackbird”). The effect of the ANDed search is that both the words Blackbird and reconnaissance MUST occur in the verbiage of the Web page, and that the word blackbird is more important than reconnaissance because it comes first in the search, hence pages with more occurrances of the word blackbird will be sorted first in the resulting list by most search engines. Each of the search engines prioritizes the hits they return, on the basis of their own set of rules for importance, credibility or popularity. What these search engines do not do is to figure out just what you are really interested in.
OBJECTS OF THE INVENTION
It is an object of this invention to enhance a body of text to add focused and selected queries to the text.
It is a further object of the invention to provide a system to automatically add highly relevant and focused queries to a text, such as magazine articles, news stories or any other text.
SUMMARY OF THE INVENTION
Brilliant queries require a preparation process that analyzes any text to enhance and generate a set of suggested searches based on that analysis and certain pre-set user parameters. The output of this preparation process can be used to add links to an HTML page of a document either automatically or through manual insertion of the resulting analysis.
Brilliant query links have two components: a “hook” and a “keyword”. The hook is an overall concept or phrase that describes the subject matter of the text body. The keyword is a word that is derived from the analysis of the text and indicates a secondary or related concept. Brilliant queries are a collection of one or more pairings of the hook and a keyword. For example, an article on the SR-71 Blackbird Airplane might have the following brilliant queries:
1. Search for more information on BLACKBIRD and AVIATION
2. Search for more information on BLACKBIRD and ELINT
3. Search for more information on BLACKBIRD and RECONNAISSANCE
4. Search for more information on BLACKBIRD and TRANSPORT
The hook is BLACKBIRD and the keywords are AVIATION, ELINT (electronic intelligence) RECONNAISSANCE and TRANSPORT.
The hook is the concept, primary subject matter or main topic for a body of text. The hook is used to define a query as narrowly as possible on a particular topic for a selected information source. To determine a “hook”, a content layer must exist for which a context can be determined. There must be a perceivable structure to the information source and each content entry must have an associated context or place or places within the structure of the information source.
Information source is defined as any collection of content that is searchable for the purpose of locating specific content selections. Included in this are encyclopedias, news archives, dictionaries and other specific content collections. This includes search engines on the Internet that effectively turn the entire Internet into a single searchable information source.
Keywords are simply a collection of words, generated automatically or manually, that are deemed to be indicative of the topic matter or one of the topics for a given content selection. Keywords are determined by comparison of a predetermined list of keywords to the text of the content selection. If the content selection contains one or more of the keywords, then that keyword is associated with that text body and potentially used for the brilliant query. Keywords may also be determined by statistical frequency analysis of the text, with or without manual selection and addition of synonyms.
Stopwords are a collection of words that are used so frequently in a language that they provide no benefit at all as a search target for the selection of relevant content. This includes articles, conjunctions, prepositions, pronouns, etc.
Selection of Keywords
A brilliant query requires a list of keywords that are generated by automatic or manual statistical and empirical analysis of the body of content to be enhanced or a comparable body of content. The keyword list for a given content source is generated through the use of frequency analysis, stopword removal and finally, manual selection using empirical testing of the results generated by a given potential keyword. Based on experience, a solid keyword list usually runs between 250 and 1000 words and phrases which are chosen by the system designer.
Also, keywords can be manually tuned through the use of a thesaurus feature whereby a given keyword can be associated with one or more synonyms that would indicate the use of the keyword whenever one or more of the synonyms appear in the body of text to be enhanced.
Automatic Generation of the Hook
One embodiment of the brilliant query to enable an automatic process for generating brilliant queries for a body of text, is to determine the hook by extracting the highest frequency proper names from the text body. This process requires a two-pass analysis of the body of text. The first pass simply generates a frequency table with an entry for each word with the exclusion of stop words.
The second pass relies on the identification of proper names and punctuation to select a hook. Proper names are identified by locating all adjacent capitalized words not separated by punctuation. Frequencies for each proper name sequence are calculated by averaging the individual word frequencies by the number of words in the sequence. The hook is then selected by using the most frequently mentioned proper name sequence with the highest frequency. Also, if a word appears in multiple sequences, the longer sequence is given preference, even if it is a lower frequency than the shorter sequence.
For Example
“Governor Bush had a strong, substantive week,” Communications Director Karen Hughes said of a six-day, nine state swing in which Bush recovered from the verbal gaffes and tactical blunders that plagued his campaign in late August and early September.
“Governor Bush”—frequency (1.5);
“Communications Director Karen Hughes”—frequency (1.0);
“Bush”—frequency (2.0);
“August”—stopworded;
“September”—stopworded.
The analysis of the previous text results in “Governor Bush” being selected as the hook since “Bush” appears as a single word and as part of a phrase, the longer phrase is used even though it has a lower frequency than “Governor Bush”.
Common proper names such as days of the week and months are included in the stopword list. The automatic hook generation technique described here works very well for encyclopedic and news related content sources.
Automatic Generation of the Keywords
A word frequency analysis is done on all of the text, with stopwords excluded, and the resulting words, by order of frequency are compared to a pre-selected keyword list. Those that match, based upon a desired frequency become keywords to be combined with the hook to form focused, optimal queries.
Generation of Brilliant Queries
Once the keywords have been selected and the hook for a body of text has been determined or automatically generated, the searches are created by generating a link for every keywo

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Brilliant query system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Brilliant query system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Brilliant query system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3007073

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.