Systems and methods of handling internet spiders

Data processing: database and file management or data structures – Database and file access – Search engines

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S225000

Reexamination Certificate

active

07987173

ABSTRACT:
Aspects relate to identifying Internet spiders with an approach involving a plurality of instances of one or more URLs, which reference resources available from a first domain. Instances of the URLs are distributed at other Internet domains. Spiders crawling those domains will activate those URL instances, resulting in requests for the resources referenced by the URLs. A generator of a number of requests for the same resource, from a potential multitude of URL instances, can cause the generator to be categorized as a spider. Similarly, a generator of a number of requests for resources identified by different URLs also can be categorized as spider behavior. In some cases, the first domain may not have a browseable site infrastructure with, such that a spider would not readily crawl it by following internal links. The URLs can refer to custom queries created by various users, who can provide the URLs on their pages, such as on social networking sites.

REFERENCES:
patent: 6662230 (2003-12-01), Eichstaedt et al.
patent: 2004/0025055 (2004-02-01), Hamadi et al.
patent: 2005/0188215 (2005-08-01), Shulman et al.
patent: 2006/0248452 (2006-11-01), Lambert et al.
patent: 2006/0256729 (2006-11-01), Chen et al.
patent: 2007/0078983 (2007-04-01), Modrall
patent: 2008/0270604 (2008-10-01), Cooper et al.
patent: 2009/0327249 (2009-12-01), Pappas
patent: 2010/0023751 (2010-01-01), He
patent: 2004/070509 (2004-08-01), None
Shaozhi Ye, Guohan Lu and Xing Li, “Workload-Aware Web Crawling and Server Workload Detection,” In Proceedings of the second Asia-Pacific Advanced Network Research Workshop, pp. 263-269, Jul. 2004, Cairns, Australia (Available online at http://wwwcsif.cs.ucdavis.edu/˜yeshao/papers/apan04.pdf, last visited May 11, 2010).
G. Buehrer, J.W. Stokes, K. Chellapilla, and J.C. Platt, “Classification of Automated Web Traffic,” Chapter in Weaving Services and People on the World Wide Web, Springer Verlag, 2007 (Available online at http://research.microsoft.com/pubs/120191/ClassAutoSearchTraffic.pdf, last visited May 11, 2010).
Zhichun Li, Anup Goyal and Yan Chen, “Honeynet-based Botnet Scan Traffic Analysis,” invited book chapter for Botnet Detection: Countering the Largest Security Threat, Springer, 2007 (Available online at http://www.cs.northwestem.edu/˜ychen/Papers/botnetBook.pdf, last visited May 11, 2010).
Jiang Wang, Anup Ghosh, and Yih Huang, “Web Canary: A Virtualized Web Browser to Support Large-Scale Silent Collaboration in Detecting Malicious Web Sites,” Proceedings of the 4th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2008, Orlando, FL, USA, Nov. 13-16, 2008 (Available online at http://mason.gmu.edu/˜jwanga/Canaries.pdf, last visited May 11, 2010).
Bettina Berendt and Myra Spiliopoulou, “Analysis of navigation behaviour in web sites integrating multiple information systems,” The VLDB Journal—The International Journal on Very Large Data Bases, vol. 9, issue 1, 2000, pp. 56-75, Springer-Verlag 2000.
V. Boyapati, K. Chevrier, A. Finkel, N. Glance, T. Pierce, R. Stockton and C. Whitmer, “ChangeDetector™: a Site-Level Monitoring Tool for the WWW,” Proceedings of the 11th international conference on World Wide Web, May 7-11, 2002, Honolulu, Hawaii, USA, Session: Description and Analysis, pp. 570-579, ACM, New York, NY, 2002.
J. Cardiff, T. Catarci, M. Passeri, and G. Santucci, Querying Multiple Databases Dynamically on the World Wide Web, Proceedings of the First International Conference on Web Information System Engineering (WISE'00), vol. 1, pp. 238-245, 2000, IEEE Computer Society, Washington, DC, USA.
A. Carlson, J. Betteridge, E.R. Hruschka Jr. and T.M. Mitchell, “Coupling Semi-Supervised Learning of Categories and Relations,” Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pp. 1-9, Boulder, Colorado, Jun. 2009, Association for Computational Linguistics.
Tiziana Catarci, “Web-Based Information Access,” Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems (COOPIS), pp. 10-19, 1999, IEEE Computer Society, Washington, DC, USA.
Fan Chunlong, Yu Zhouhua and Xu Lei, “Detecting Capability Evaluate of Spider Detection Techniques,” 2010 2nd International Conference on Computer Engineering and Technology, vol. 7, pp. 268-271, IEEE Computer Society, Washington, DC, USA.
Y. Ding, Q. Li, Z. Yan and Y. Dong, “Web Informative Content Block Detecting Based on Entropy and Parent-Child Relationship in DOM,” Proc. of the 2008 IEEE International Conference on Information and Automation (ICIA 2008), Jun. 20-23, 2008, Zhanggjiajie, China, pp. 175-178, IEEE Computer Society, Washington, DC, USA.
Derek Doran and Swapna S. Gokhale, “Discovering New Trends in Web Robot Traffic Through Functional Classification,” Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications (NCA), pp. 275-278, 2008, IEEE Computer Society, Washington, DC, USA.
J.V. Hansen, P.B. Lowry, R.D. Meservy, and D.M. McDonald, “Genetic programming for prevention of cyberterrorism through dynamic and evolving intrusion detection,” Decision Support Systems, vol. 43, issue 4, Aug. 2007, pp. 1362-1374, Elsevier Science Publishers B. V., Amsterdam, The Netherlands.
Xiangji Huang, Aijun An and Nick Cercone, “Comparison of Interestingness Functions for Learning Web Usage Patterns,” Proc. of the Eleventh International Conference on Information and Knowledge Management (CIKM'02), Nov. 4-9, 2002,McLean, Virginia, USA, pp. 617-620, 2002, ACM, New York, NY.
X. Huang, A. An, N. Cercone and G. Promhouse, “Discovery of Interesting Association Rules from Livelink Web Log Data,” Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM), pp. 763-766, 2002, IEEE Computer Society, Washington, DC, USA.
Paul Huntington, David Nicholas and Hamid R. Jamali, “Web robot detection in the scholarly information environment,” Journal of Information Science, vol. 34, issue 5, pp. 726-741, 2008, Sage Publications, Inc., Thousand Oaks, CA.
Alpa Jain and Patrick Pantel, “Identifying Comparable Entities on the Web,” Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09), Nov. 2-6, 2009, Hong Kong, China, pp. 1661-1664, 2009, ACM, New York, NY.
Balachander Krishnamurthy and Jia Wang, “On Network-Aware Clustering of Web Clients,” ACM SIGCOMM Computer Communication Review, Oct. 2000, Stockholm, Sweden, vol. 30, issue 4, pp. 97-110, 2000, ACM, New York, NY.
S. Lakshminarayana, “Categorization of web pages—Performance enhancement to search engine,” Knowledge-Based Systems, vol. 22, issue 1, Jan. 2009, pp. 100-104, 2008, Elsevier Science Publishers B. V. Amsterdam, The Netherlands.
Y. Liu, R. Cen, M. Zhang, S. Ma, and L. Ru, “Identifying Web Spam with User Behavior Analysis,” Proc. of the 4th international Workshop on Adversarial Information Retrieval on the Web (AIRWEB'08), Apr. 22, 2008, Beijing, China, vol. 295, pp. 9-16, 2008, ACM, New York, NY.
Anália Lourenço and Orlando Belo, “Catching Web Crawlers in the Act,” Proceedings of the 6th international conference on Web Engineering (ICWE'06), Jul. 11-14, 2006, Palo Alto, CA, vol. 263, pp. 265-272, 2006, ACM, New York, NY.
Anália Lourenço, Ronnie Alves, and Orlando Belo, “When the Hunter Becomes the Prey—Tracking down Web Crawlers in Clickstreams,” Proc. of the 1st Data Gadgets Workshop, JISBD, 2004 (Available online at http://alfa.di.uminho.pt/˜ronnie/files—files/ufr/2004-dataGadgets-v1.pdf, last visited Jul. 29, 2010).
M.R. Meiss, F. Menczer, S. Fortunato, A. Flammini and A. Vespignani, “Ranking Web Sites with Real User Traffic,” Proceedings of the international conference on Web Search

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Systems and methods of handling internet spiders does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Systems and methods of handling internet spiders, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Systems and methods of handling internet spiders will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2728754

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.