Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-09-29
2003-06-24
Metjahic, Safet (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06584468
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to communications in general. More particularly, the invention relates to a method and apparatus to retrieve information from a network such as the Internet.
BACKGROUND OF THE INVENTION
The existing amount of information available over the Internet and World Wide Web (WWW) is staggering. There are literally millions of “web pages” full of information on almost any topic of interest. Moreover, this amount of information is increasing at a geometric rate. This sheer volume of information has made the search for specific types of information a significant challenge. The complexity of this challenge may be better understood with some background information regarding the Internet and WWW in general.
The Internet comprises a network of computers interconnected by some form of communication medium. The type of computer could range from handheld computers and pocket PCs to high-end mainframe and supercomputers. The communication mediums may vary between twisted pair, co-axial cable, optical fibers and radio-frequencies. Each computer is equipped with software and hardware that enables each computer to communicate using the same procedures or language. These procedures and language are often referred to as protocols, which are often layered over one another to form something called a “protocol stack.” One such protocol is referred to as the Hypertext Transfer Protocol (HTTP) and it permits the transfer of Hypertext Markup Language (HTML) documents between computers. The HTML documents are often referred to as “web pages” and are files containing information in the form of text, video, images, links to other web pages, and so forth. Each web page is stored in a computer (sometimes referred to as an “Internet Server”) and has a unique address referred to as a Universal Resource Locator (URL). The URL is used by a program referred to as a “web browser” located on one computer to find a web page stored somewhere on another computer connected to the network. This creates a “web” of computers each storing a number of web pages that can be accessed and transferred using a standard protocol, and hence this web of computers is referred to as the WWW.
A complete field of technology has arisen that focuses upon making it easier for a user to find information available over the Internet. There are a large number of “search engines” that permit the user to enter key words or phrases. The search engine then searches the Internet to find web pages that contain the key terms. The results are then presented to the user in some sort of ranked fashion. Given the sheer volume of information available over the Internet and WWW, however, search time can be extremely long. This is particularly problematic in an age when users are demanding faster performance in information retrieval tools. Moreover, the search results may often have little relevance to the user's initial request.
In order to accelerate the search process, some search engines build internal databases using a search program referred to as a “web crawler.” The idea is that by building an internal database, much of the search work can be done prior to a user's request for information thereby decreasing search times. A web crawler performs as its name suggests. The program periodically “crawls” or searches the Internet and attempts to catalog or index the information available in certain web pages. The index is stored in a database that is accessible to the search engine. In this manner, when a user enters a search term, the internal database is searched first in a relatively fast and efficient manner.
A problem with conventional web crawlers, however, is that they are designed to collect a limited set of information about the web page. Each web page typically has a list of terms provided by the web page designer that attempts to identify the content found within the web page. The web crawler retrieves this list of terms and stores the terms in a database. This list of terms, however, is typically limited to what the web designer deems significant. Consequently, it may not be accurate or comprehensive. Moreover, in many instances, this list may contain terms that are misleading. For example, a web page having information about a particular brand of car may include in its list of terms the name of several competitors. When the user inputs the competitor's name in a search engine, the unintended web page would be retrieved as part of the search results.
Another problem with conventional web crawlers is that they are designed to locate general information. They simply search for web pages in a random manner and index those web pages within the initial search parameters. These conventional web crawlers, however, are not optimized to locate a specific set or domain of information. Accordingly, the conventional web crawler is not efficient or effective when attempting to catalog or index specialized information.
In view of the foregoing, it can be appreciated that a substantial need exists for a web crawler that solves the above-discussed problems.
SUMMARY OF THE INVENTION
One embodiment of the invention comprises a method and apparatus to index network information. A network is searched for files of information relevant to people and resources in a particular field using a search list of weighted links to the files. The information is parsed into content and additional links to additional files. The content is weighted and copied to memory (such as a database). A determination is made as to whether the additional links are relevant to the people and resources in the given particular field. Those additional links that are relevant are weighted using a predetermined weighting algorithm. The relevant additional weighted links are copied to the search list. This process continues until an ending condition occurs.
With these and other advantages and features of the invention that will become hereinafter apparent, the nature of the invention may be more clearly understood by reference to the following detailed description of the invention, the appended claims and to the several drawings attached herein.
REFERENCES:
patent: 5724567 (1998-03-01), Rose et al.
patent: 5835087 (1998-11-01), Herz et al.
patent: 5842206 (1998-11-01), Sotomayor
patent: 5855015 (1998-12-01), Shoham
patent: 5875446 (1999-02-01), Brown et al.
patent: 5903892 (1999-05-01), Hoffert et al.
patent: 5913208 (1999-06-01), Brown et al.
patent: 5966126 (1999-10-01), Szabo
patent: 5974409 (1999-10-01), Sanu et al.
patent: 5983221 (1999-11-01), Christy
patent: 5987454 (1999-11-01), Hobbs
patent: 6029161 (2000-02-01), Lang et al.
patent: 6038574 (2000-03-01), Pitkow et al.
patent: 6055538 (2000-04-01), Kessenich et al.
patent: 6067552 (2000-05-01), Yu
patent: 6078914 (2000-06-01), Redfern
patent: 6078917 (2000-06-01), Paulsen, Jr. et al.
patent: 6085186 (2000-07-01), Christianson et al.
patent: 6182065 (2001-01-01), Yeomans
patent: 6295559 (2001-09-01), Emens et al.
patent: 6356899 (2002-03-01), Chakrabarti et al.
patent: 6389467 (2002-05-01), Eyal
patent: 6434556 (2002-08-01), Levin et al.
patent: 6438539 (2002-08-01), Korolev et al.
patent: WO 98/18088 (1998-04-01), None
patent: WO 99/48028 (1999-09-01), None
patent: WO 99/57656 (1999-11-01), None
patent: WO 00/38086 (2000-06-01), None
Budi Yuwono and Dik L. Lee (1996), Search and Ranking Algorithms for locating resources on the world wide web, pp. 164-171.*
Krishna Bharat and Monika R. Henzinger (1998), Improved Algorithms for Topic Distillation in a Hyperlinked Environment, pp. 104-111.*
“About Mamma” Internet Search Engine, http://www.mamma.com Jul. 9, 2001.
“CNET Search” Internet Search Engine, http://www.savvy.search.com Jul. 9, 2001.
“MetaCrawler Today” Internet Search Engine, http://www.metacrawler.com Jul. 9, 2001.
“Multiple Search Engine” Internet Search Engine, http://www.196.3.0.4 Jul. 9, 2001.
“Making the Web Work Wonders” Internet Search Engine, http://www.wonderport.com Jul. 9, 2001.
Gabriel Kaigham J.
Indianer Evan M.
Lenhart Joel
Umbel Christopher M.
Metjahic Safet
Morgan & Lewis & Bockius, LLP
Nguyen Merilyn
NineSigma, Inc.
LandOfFree
Method and apparatus to retrieve information from a network does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus to retrieve information from a network, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus to retrieve information from a network will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3148236