Scheduler for search engine crawler

Electrical computers and digital processing systems: virtual mac – Task management or control – Process scheduling

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C718S103000, C707S706000

Reexamination Certificate

active

08042112

ABSTRACT:
A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.

REFERENCES:
patent: 5634062 (1997-05-01), Shimizu et al.
patent: 5801702 (1998-09-01), Dolan et al.
patent: 5832494 (1998-11-01), Egger et al.
patent: 5898836 (1999-04-01), Freivald et al.
patent: 6003060 (1999-12-01), Aznar et al.
patent: 6012087 (2000-01-01), Freivald et al.
patent: 6049804 (2000-04-01), Burgess et al.
patent: 6189019 (2001-02-01), Blumer et al.
patent: 6219818 (2001-04-01), Freivald et al.
patent: 6243091 (2001-06-01), Berstis
patent: 6263350 (2001-07-01), Wollrath et al.
patent: 6263364 (2001-07-01), Najork et al.
patent: 6269370 (2001-07-01), Kirsch
patent: 6285999 (2001-09-01), Page
patent: 6321265 (2001-11-01), Najork et al.
patent: 6336123 (2002-01-01), Inoue et al.
patent: 6351755 (2002-02-01), Najork et al.
patent: 6377984 (2002-04-01), Najork et al.
patent: 6404446 (2002-06-01), Bates et al.
patent: 6418433 (2002-07-01), Chakrabarti et al.
patent: 6418453 (2002-07-01), Kraft et al.
patent: 6424966 (2002-07-01), Meyerzon et al.
patent: 6547829 (2003-04-01), Meyerzon et al.
patent: 6594662 (2003-07-01), Sieffert et al.
patent: 6631369 (2003-10-01), Meyerzon et al.
patent: 6638314 (2003-10-01), Meyerzon et al.
patent: 6701350 (2004-03-01), Mitchell
patent: 6751612 (2004-06-01), Schuetze et al.
patent: 6763362 (2004-07-01), McKeeth
patent: 6772203 (2004-08-01), Feiertag et al.
patent: 6950874 (2005-09-01), Chang et al.
patent: 6952730 (2005-10-01), Najork et al.
patent: 7043473 (2006-05-01), Rassool et al.
patent: 7047491 (2006-05-01), Schubert et al.
patent: 7080073 (2006-07-01), Jiang et al.
patent: 7089233 (2006-08-01), Osias
patent: 7139747 (2006-11-01), Najork
patent: 7148991 (2006-12-01), Suzuki et al.
patent: 7171619 (2007-01-01), Bianco
patent: 7200592 (2007-04-01), Goodwin et al.
patent: 7231606 (2007-06-01), Miller et al.
patent: 7260543 (2007-08-01), Saulpaugh et al.
patent: 7299219 (2007-11-01), Green et al.
patent: 7308643 (2007-12-01), Zhu et al.
patent: 7310632 (2007-12-01), Meek et al.
patent: 7343412 (2008-03-01), Zimowski
patent: 7346839 (2008-03-01), Acharya et al.
patent: 7447777 (2008-11-01), Ahuja et al.
patent: 7483891 (2009-01-01), Liu et al.
patent: 7725452 (2010-05-01), Randall
patent: 7769742 (2010-08-01), Brawer et al.
patent: 2002/0010682 (2002-01-01), Johnson
patent: 2002/0023158 (2002-02-01), Polizzi et al.
patent: 2002/0052928 (2002-05-01), Stern et al.
patent: 2002/0065827 (2002-05-01), Christie et al.
patent: 2002/0073188 (2002-06-01), Rawson, III
patent: 2002/0087515 (2002-07-01), Swannack et al.
patent: 2002/0099602 (2002-07-01), Moskowitz et al.
patent: 2002/0129062 (2002-09-01), Luparello
patent: 2003/0061260 (2003-03-01), Rajkumar
patent: 2003/0131005 (2003-07-01), Berry
patent: 2003/0158839 (2003-08-01), Faybishenko et al.
patent: 2004/0044962 (2004-03-01), Green et al.
patent: 2004/0064442 (2004-04-01), Popovitch
patent: 2004/0128285 (2004-07-01), Green et al.
patent: 2004/0225642 (2004-11-01), Squillante et al.
patent: 2004/0225644 (2004-11-01), Squillante et al.
patent: 2005/0071766 (2005-03-01), Brill et al.
patent: 2005/0086206 (2005-04-01), Balasubramanian et al.
patent: 2005/0154746 (2005-07-01), Liu et al.
patent: 2005/0192936 (2005-09-01), Meek et al.
patent: 2006/0036605 (2006-02-01), Powell et al.
patent: 2006/0069663 (2006-03-01), Adar et al.
patent: 2006/0277175 (2006-12-01), Jiang et al.
patent: WO 01/50320 (2001-07-01), None
patent: WO 01/86507 (2001-11-01), None
Ali,What's Changed? Measuring Document Change in Web Crawling for Search Engines, SPIRE 2003, LNCS 2857, 2003, pp. 28-42, Springer-Verlag, Berlin, Germany.
Arasu,Searching the Web, ACM Transactions on Internet Technology, ACM Transactions on Internet Technology, vol. 1, No. 1, Aug. 2001, pp. 2-43.
Baeza-Yates,Balancing Volume, Quality and Freshness in Web Crawling, Center for Web Research, Dept. of Computer Science, University of Chile, 2002, pp. 1-10.
Brandman,Crawler-Friendly Web Servers, ACM SIGMETRICS Performance Evaluation Review, vol. 28, Issue 2, Sep. 2000, pp. 9-14.
Brin,The Anatomy of a Large-Scale Hypertextual Web Search Engine, In Proc. of the 7th International World Wide Web Conference, 1998, pp. 1-26.
Brusilovsky,Map-Based Horizontal Navigation in Educations Hypertext, ACM Press, Jun. 2002, pp. 1-10.
Bullot,A Data-Mining Approach for Optimizing Performance of an Incremental Crawler, WI '03, Oct. 13-17, 2003, pp. 610-615.
Cho,Crawling the Web: Discovery and Maintenance of Large-Scale Web Data, PhD Thesis, Dept. of Computer Science, Stanford University, Nov. 2001, 188 pages.
Cho,Effective Page Refresh Policies for Web Crawlers, ACM Transactions on Database Systems, vol. 28, No. 4, Dec. 2003, pp. 390-426.
Cho,Efficient Crawling Through URL Ordering, Computer Networks and ISDN Systems, vol. 30, Issues 1-7, Apr. 1998, pp. 161-172.
Cho,Estimating Frequency of Change, ACM Transactions on Internet Technology, vol. 3, No. 3, Aug. 2003, pp. 256-290.
Cho,Synchronizing a Database to Improve Freshness, MOD 2000, Dallas, TX, Jun. 2000, pp. 117-128.
Cho,The Evolution of the Web and Implications for an Incremental Crawler, Proceedings of the 26th VLDB Conf., Cairo, Egypt, 2000, pp. 200-209.
Coffman,Optimal Robot Scheduling for Web Search Engines, Tech. Rep. RR3317, 1997, 19 pages.
Douglis,Rate of Change and Other Metrics: a Live Study of the World Wide Web, USENIX Symposium on Internetworking Technologies and Systems, Monterey, CA, Dec. 1997, pp. I and 1-13.
Douglis,The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web, World Wide Web, vol. 1, No. 1, Mar. 1998, pp. 27-44.
Fetterly,A Large-Scale Study of the Evolution of Web Pages, WWW 2003, Budapest, Hungary, May 20-24, 2003, pp. 669-678.
Haveliwala,Topic-Sensitive PageRank, WWW2002, Honolulu, HI, May 7-11, 2002, 10 pages.
Henzinger,Web Information Retrieval—an Algorithmic Perspective, ESA 2000, LNCS 1879, 2000, pp. 1-8, Springer-Verlag, Berlin, Germany.
Heydon,Mercator: A Scalable, Extensible Web Crawler, World Wide Web, vol. 2, No. 4, Dec. 1999, pp. 219-229.
Hirai,WebBase: a Repository of Web Pages, Computer Networks, vol. 33, Jun. 2000, pp. 277-293.
Introna,Defining the Web: The Politics of Search Engines, Computer, vol. 22, Issue 1, Jan. 2000, pp. 54-62.
Jeh,Scaling Personalized Web Search, WWW2003, Budapest, Hungary, May 20-24, 2003, pp. 271-279.
Kamvar,Exploiting the Block Structure of the Web for Computing PageRank, Stanford University Technical Report, 2003, 13 pages.
Klemm,WebCompanion: A Friendly Client-Side Web Prefetching Agent, IEEE Transactions on Knowledge and Data Engineering, vol. 11, No. 4, Jul./Aug. 1999, pp. 577-594.
Lee,Intelligent Agents for Matching Information Providers and Consumers on the World-Wide-Web, Proc. of the 13th Annual Hawaii International Conf. on System Sciences, 1997, 11 pages.
Pandey,Monitoring the Dynamic Web to Respond to Continuous Queries, WWW2003, Budapest, Hungary, May 20-24, 2003, pp. 659-668.
Shkapenyuk,Design and Implementation of a High-Performance Distributed Web Crawler, ICDE '02, San Jose, CA, Feb. 26-Mar. 1, 2002, pp. 357-368.
Suel,ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval, WebDB, San Diego, CA, Jun. 12-13, 2003, pp. 1-6.
Wolf,Optimal Crawling Strategies for Web Search Engines, WWW 2002, Honolulu, Hawaii, May 7-11, 2002, pp. 136-147.
Najork, Br

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Scheduler for search engine crawler does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Scheduler for search engine crawler, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Scheduler for search engine crawler will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4264923

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.