Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-06-11
2004-04-13
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06721729
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed to a method for searching electronic data files and, more particularly, to a method including the entering of a two-dimensional array of search concepts, each concept being predefined key words and expressions or user-defined key words and expressions, and detecting and displaying a correlation of occurrence, within the electronic data files, between entered concepts in the respective dimensions.
2. Related Art
The amount of information generated, collected, stored, communicated and accessible through the electronic media is continuing to increase. The increase is not only in the volume; it is in the number of sources, and the variety of formats in which the information is communicated and stored. The sources include newspapers, technical journals, government publications, literary works, laws, court opinions, business reports, and public records. More and more of these are being generated, stored, searched, retrieved, and distributed through networked systems of digital computers and other digital document generation and management devices. The migration of these and other sources, and large archives of the same, to electronic media is generally attributed to a combination of the Internet and the increasing number of and capabilities of personal computers (PCs) and other Internet access devices.
The average operator-user with an entry-level PC, a telephone line, and a subscription to an Internet Service Provider (ISP), such as America On Line®, now has access to literally billions of documents, forms, images, and text files, stored throughout the world on a myriad of databases. A large number of the databases are available as free access, to anyone, while others are subscription based or otherwise limited access. There are large databases which, although not directly accessible through the World Wide Web, are available through controlled-access wide area networks (WANs). As known to persons skilled in the relevant art, these may be physically separate from the Internet or may be Virtual Private Networks (VPNs) which coexist on the Internet with public data traffic. Through such private networks an authorized person may have access to large proprietary databases of technical journals, customer profiles, medical records, criminal records, internal memoranda, business reports and the like.
There are continuing problems, though, with searching such a large number of electronic files. Many of these problems prevent users from fully exploiting the Internet, and other wide area networks, and the many databases which these networks make available for their use. One of the problems is the formulation of a search strategy. Search strategy includes the choice of particular features that the user believes, or has otherwise determined, would be contained in, described by, or descriptive of the electronic files relating to the topic that he or she is researching. The choosing of these search features is critical to the research task, yet in most cases it is carried out using nothing more than intuition, trial and error.
Stated more particularly, a typical search of the World Wide Web is as follows: A user accesses the Internet through, for example, an Internet Service Provider such as America On Line®. The user then, using computer software features that are well known in the art, enables a web browser program that resides on his or her personal computer, such as, for example, Microsoft Explorer® or Netscape Navigator®. As is well known in the art, the web browser is usually programmed with a default “home page”, which is the Universal Resource Locator (“URL”) of a specific web site. The web browser then performs the required Hypertext Transfer Protocol (“HTTP”) communications with the web server hosting the home page.
The home page may be hosted by a commercial web services/advertising entity, such as Microsoft Network®, Excite®, and Yahoo®. Such commercial home pages generally have one or more icons representing search engines, both their own and those of third parties such as Lycos® and Infobot®. When the user clicks on the search engine, he or she is presented with a display page typically having a field for entering the search query terms, also referenced in the art as “key words”.
The typical user then proceeds to enter the key words. Many commercially available Internet search engines provide Boolean connectors of AND, OR and NOT for connecting the key words. Boolean searching ideally identifies all documents containing the defined connection of string of “key words”. This may be with or without further limitations, such as year, language, publisher, and other type characteristics. Some of the sophisticated Boolean search methods permit the user to define search terms to include not only the term itself, but also the synonyms of, and the ranges around the term. There are available search engines that have the ability to group key words according to parenthesis. This permits more complex Boolean expressions.
The entry field, though, forms the key words into a one-line expression, regardless of the number of terms. Therefore, in that one line expression, the user is attempting to formulate a single Boolean expression that will, based only on his or her intuitive sense, have a “feels OK” likelihood of finding relevant files, i.e., “hits”, but is not so broad that it retrieves an unwieldy number.
In a typical scenario of Boolean searching, however, the user would not simply formulate a single expression, and then conduct the entire search using only that expression. Instead, the process is typically as follows: The user attempts a first Boolean expression and gets a number of “hits”. If the number of hits is zero the user will usually vary the expression, either by removing one of the AND operators and thus lowering the criteria required for a document to qualify as a hit, or by substituting a synonym for one or more of the search terms. If the number is too high the user may retrieve, by one of the known methods, a sample set of the “hits” and read them to identify his or her next strategy. Most often the user will simply add further search criteria, typically by connecting another key word to the original Boolean phrase by an AND operator, and then run another search. When the process is completed, which is frequently coincident with the point where the user runs out of time, the typical user will have attempted a generally random sequence of different Boolean expressions, and many variations on each. The user has, hopefully at least, laboriously retrieved and reviewed documents obtained from each search expression and, in a method that is typically unique to each user, has collected and combined these into, for example, a research report.
There are numerous problems with this method. One major problem is that the user is attempting to find an optimal search phrase, using the number of “hits” resulting from each attempt compared to the previous attempt as the sole heuristic. For example, assume that a user is writing a paper on trends in the number of children who are transported to and from school by busses as compared to the number who are transported by parents or guardians. Assume that the first Boolean phrase that the person uses is the previous example of (CHILD OR KIDS) AND (BUS OR (“PUBLIC TRANSPORTATION”)). Assume that the user is searching the Internet, using known methods of Internet access. If the number of hits is too high the user will add another search term. An example would be PERCENTAGE TRANSPORTED. The typical user would then run the search again and see the number of hits. After a number of iterations the user would finally obtain an acceptable number of hits, for example thirty.
The search “methodology” described above has other shortcomings. One is that the user might not record the various search Boolean phrases that were attempted before he or she finds the phrase that yields the desired thirty hits. As a result the user might run the same search twice, or might forget to t
Morris, Jr. William Norman
Ngo Phu Thien
Nguyen Thanh Ngoc
Mizrahi Diane D.
Patton & Boggs LLP
LandOfFree
Method and apparatus for electronic file search and collection does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for electronic file search and collection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for electronic file search and collection will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3208272