Data processing: database and file management or data structures – Database and file access – Search engines
Reexamination Certificate
2006-10-13
2010-11-02
Trujillo, James (Department: 2159)
Data processing: database and file management or data structures
Database and file access
Search engines
Reexamination Certificate
active
07827166
ABSTRACT:
Techniques for identifying duplicate webpages are provided. In one technique, one or more parameters of a first unique URL are identified where each of the one or more parameters do not substantially affect the content of the corresponding webpage. The first URL and subsequent URLs may be rewritten to drop each of the one or more parameters. Each of the subsequent URLs is compared to the first URL. If a subsequent URL is the same as the first URL, then the corresponding webpage of the subsequent URL is not accessed or crawled. In another technique, the parameters of multiple URLs are sorted, for example, alphabetically. If any URLs are the same, then the webpages of the duplicate URLs are not accessed or crawled.
REFERENCES:
patent: 7200677 (2007-04-01), Allen et al.
patent: 2003/0069803 (2003-04-01), Pollitt
patent: 2004/0172389 (2004-09-01), Galai et al.
patent: 2005/0081140 (2005-04-01), Allen et al.
patent: 2005/0216474 (2005-09-01), Wiener
patent: 2006/0026194 (2006-02-01), Bhushan et al.
patent: 2006/0070022 (2006-03-01), Ng et al.
patent: 2006/0129463 (2006-06-01), Zicherman
patent: 2006/0218143 (2006-09-01), Najork
patent: 2006/0248066 (2006-11-01), Brewer
patent: 2007/0106676 (2007-05-01), Allen et al.
patent: 2008/0140626 (2008-06-01), Wilson
patent: 2006215735 (2006-08-01), None
patent: WO 2004008340 (2004-01-01), None
Cho, Efficient Crawling through URL ordering, 1998, p. 161-162.
Brain, How Web Servers and the Internet Work, 2001, pp. 1-8.
Brain, How CGI Scripting Works 2001, pp. 1-11.
Lee, On URL Normalization, 2005, pp. 1076-1085.
Schonfeld, Do not Crawl in the DUST: Different URLs with Simialar Text Extended Abstract, May 2006, pp. 1-2.
Schonfeld, Do Not Crawl in the DUST: Different URLs with Similar Text, Research Thesis, Feb. 2006, pp. 1-58.
Slawski, Solving Different URLs with Similar Text (DUST), Sep. 4, 2006, pp. 1-5.
Freitag, Google Sitemaps Protocol, 2005, pp. 1-5.
Bhattacharjee Arnabnil
Garg Priyank S.
Hickman Palermo & Truong & Becker LLP
Ledesma Daniel D.
Nicholes Christian A.
Phillips Albert
Trujillo James
LandOfFree
Handling dynamic URLs in crawl for better coverage of unique... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Handling dynamic URLs in crawl for better coverage of unique..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Handling dynamic URLs in crawl for better coverage of unique... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4229286