Boolean text search combined with extended regular...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06681217

ABSTRACT:

BACKGROUND
Computer based text searches have many applications. Some of these include applications to automated Taxonomy, the categorization of text documents. The most prominent applications, however, though certainly not the most sophisticated, are those on the Internet. Most Internet based companies, such as Alta Vista, Yahoo, or Excite, provide users with a simple, literal text search function. Some of them also provide an “Advanced Search” function.
When using the simple, literal search function, the user types in the literal word or words to be found, and the search engine lists all the found texts. Many such search engines appear to allow for some synonyms of the words typed and are largely word based, that is, unless the text typed by the user is placed in quotes (or designated as literal in some other way) the search assumes a match if the typed words appear in a document in any order and in most search engines, with the implied “OR” between every word and the next one. To distinguish the different possibilities, the found documents or URLs (Universal Resource Locators) are often listed with the “best” matches first. Apparently, “best” matches means those which contain the largest number of the typed words.
For example, when we are searching for documents containing the words “search engines” typing these into the simple AltaVista search engine finds over 680,000 documents!
Advanced search engines allow a full “logical” or boolean search, by which is meant that words or phrases can be combined using the boolean operators “AND”, “OR”, “NOT”, and NEAR. (Usually NEAR means within 10 words of each other, and this number of words is not under users control.)
For example, to find documents containing the word “education” and either the word “Internet” or “networking” or “network” but not containing the words “school” or “college” the user would type the advanced search expression:
education AND (networking OR network OR Internet) AND NOT (school OR college)
Such a set of boolean operator features gives the user greater control of the text being searched. It makes it easier to find the searched for document amongst the millions of available documents, by allowing the user to narrow down the description of the documents of interest.
A little more control is provided by a “wild character” feature, which allows the user to substitute a special symbol, the wild character, for any uncertain character. Another feature, sometimes available, allows for either the presence or absence of any wild character so designated.
As users of the search engines become more discriminating and more experienced, they will demand more control than even the current advanced search engines can provide.
For example regarding the target, it would be useful to specify that the words searched for must all be within a sentence, or perhaps within a paragraph, rather than, as at present, anywhere in the document. Clearly the chances that a document, containing the specified words, is the one we want is lower if these words are spread throughout a long document, rather than if they are all within one sentence, or one paragraph. Such a feature is not currently available with search engines on the Internet, though it is available, for example, in a search tool for Eudora Email called “PowerSleuth” distributed by Nisus Software Inc.
Other possible extensions of search features include a complete text-pattern description language, allowing users to describe the text pattern without the need to know the specific text. For example, we may want to search for documents or web pages of a particular company containing a phone number or a street address without knowing either, or precisely because we do not know them. Such text pattern matching is implemented in the Unix search tools which support “Regular Expressions” for describing text patterns and are referred to as GREP searches.
An example of a text pattern matching engine, implemented as part of a Macintosh word processor, is the PowerFind™ and PowerFind Pro features within the Nisus Writer word processor for the Macintosh, first published as a software product in January of 1989 under the U. S. registered trade name “Nisus” and in more recent versions re-named “Nisus Writer.”
The PowerFind Pro engine, implemented in Nisus Writer, is an extension of the Unix GREP and includes only one boolean operator: the “OR.” The “AND’ operator can however be simulated by using the “OR” and the other features of PowerFind or PowerFind Pro. However simulating an “AND” is not very convenient. Simulating the NOT operator is not possible without additional features.
THE INVENTION
The present invention combines the features of the full boolean search with the extended Regular Expression search features, adding the control of the search target, to create a more powerful and useful search engine than any presently available. In addition, the invention adds several more boolean search features (such as the user definable NEAR, the FOLLOWED BY, and the NOT FOLLOWED BY binary operators) and extends further some of the already extended Regular Expression features from PowerFind Pro of Nisus Writer. The straight forward combination of a Regular Expression search engine and an extended Boolean search engine results in two types of OR and two types of parentheses: one used in GREP expressions and one used in Boolean expressions. Mixed expressions have to be parsed twice: once by the Regular Expression Parser the second time by the Boolean parser.
Grep expressions can be concatenated to form new meaningful expressions whose match is the concatenation of the respective matches—an intuitive result. Logical expressions, on the other hand, can only be concatenated using one of the binary boolean operators.
For example, using uppercase to designate operators and lowercase to designate any boolean expression, the boolean expression
NOT z
cannot generally be concatenated with the boolean expression
NOT a
to form a meaningful boolean expression, except by using one of the binary logical operators, such as either OR or AND, between them. So one possibility where the two expressions are combined would be:
NOT z AND NOT a
This specifies the contents of the target independently of the positions of the matches to the boolean “a” or the boolean “z”. However, frequently we need to search for a text string which can be intuitively designated as
(NOT z)(NOT a)
which means
NOT z IMMEDIATELY FOLLOWED BY NOT a
or using a more understandable description NOT z NOT IMMEDIATELY FOLLOWED BY a, which could also be designated as:
NOT (az)
once concatenation of a and z is defined.
It is relatively simple to define such concatenations of boolean expressions. Including such concatenations is equivalent to the unification of the GREP language with the Boolean language. Such a unification is a great convenience to the user and is an innovation.
The combined availability of both the Regular Expression language and the Boolean operators (other than OR) is also an innovation—even when the user has to correctly formulate the search expressions so as not to (illegally) imply concatenation of boolean expressions—that is, even before unification of booleans with Regular Expressions.
Boolean Expressions often need the definition of the “Search Target.” In current search engines on the Internet, the Search Target is implicitly the whole document, or web page and the user has no ability to control that. As exemplified in the Introduction above, it is often useful to give the user better control of what part of the text is to contain the defined search pattern. This is best done by formally defining the Search Target. Although defining the search target in itself is not new, its combination with Regular Expressions and boolean searches is new and its use for searchers on the Internet is also new.


REFERENCES:
patent: 480982 (1892-08-01), Wohlfarth
patent: 706147 (1902-08-01), Baldwin
patent: 819762 (1906-05-01), Keefe
patent: 934425 (1909-09-01), Callahan
patent: 1017015 (1912-02-01), Perkins
patent: 1025916 (1912-05-01), Kin

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Boolean text search combined with extended regular... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Boolean text search combined with extended regular..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Boolean text search combined with extended regular... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3221383

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.