System and method for distributed web crawling

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C709S219000

Reexamination Certificate

active

07139747

ABSTRACT:
The present invention provides for the efficient downloading of data set addresses from among a plurality of host computers, using a plurality of web crawlers. Each web crawler identifies URL's in data sets downloaded by that web crawler, and identifies the host computer identifier within each such URL. The host computer identifier for each URL is mapped to the web crawler identifier of one of the web crawlers. If the URL is mapped to the web crawler identifier of a different web crawler, the URL is sent to that web crawler for processing, and otherwise the URL is processed by the web crawler that identified the URL. Each web crawler sends URL's to the other web crawlers for processing, and each web crawler receives URL's from the other web crawlers for processing. In a preferred embodiment, each web crawler processes only the URL's assigned to it, which are the URL's whose host identifier is mapped to the web crawler identifier for that web crawler. Each web crawler filters the URL's assigned to it by comparing them against a database of URL's already known by the web crawler and removing the already known URL's. If a URL is not already known to the web crawler, the data set corresponding to the URL is scheduled for downloading.

REFERENCES:
patent: 5974455 (1999-10-01), Monier
patent: 6182085 (2001-01-01), Eichstaedt et al.
patent: 6199081 (2001-03-01), Meyerzon et al.
patent: 6263364 (2001-07-01), Najork et al.
patent: 6321265 (2001-11-01), Najork et al.
patent: 6351755 (2002-02-01), Najork et al.
patent: 6377984 (2002-04-01), Najork et al.
Heydon and Najork, Mercator: A Scalable, Extensible Web Crawler,World Wide Web 2, (Dec. 1999) 219-229.
Brin and Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine,In Proceedings of the Seventh International World Wide Web Conference, (Apr. 1998) 107-117.
Burner, Crawling Towards Eternity: Building an Archive of the World Wide Web,Web Techniques Magazine, (May 1997) 2(5), Available website: http://www.webtechniques.com/archives/1997/05/burner.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for distributed web crawling does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for distributed web crawling, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for distributed web crawling will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3656971

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.