Dynamic-content web crawling through traffic monitoring

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

07143088

ABSTRACT:
A dynamic-content web crawler is disclosed. These New Crawlers (NCs) are located at points between the server and user, and monitor content from said points, for example by proxying the web traffic or sniffing the traffic as it goes by. Web page content is recursively parsed into subcomponents. Sub-components are fingerpinted with a cyclic redundancy check code or other loss-full compression in order to be able to detect recurrence of the sub-component in subsequent pages. Those sub-components which persist in the web traffic, as measured by the frequency NCs (6) are defined as having substantive content of interest to data-mining applications. Where a substantive content sub-component is added to or removed from a web page, then this change is significant and is sent to a duplication filter (11) so that if multiple NCs (6) detect a change in a web page only one announcement of the changed URL will be broadcast to data-mining applications (8). The NC (6) identifies substantive content sub-components which repeatably are part of a page pointed to by a URL. Provision is also made for limiting monitoring to pages having a flag authorizing discovery of the page by a monitor.

REFERENCES:
patent: 5987471 (1999-11-01), Bodine et al.
patent: 5987515 (1999-11-01), Ratcliff et al.
patent: 6061682 (2000-05-01), Agrawal et al.
An efficient scheme to remove crawler traffic from the Internet Yuan, X.; MacGregor, M.H.; Harms, J.; Computer Communications and Networks, 2002. Proceedings. Eleventh International Conference on Oct. 14-16, 2002 pp. 90-95.
A probabilistic model for intelligent Web crawlers Ke Hu; Wing Shing Wong; Computer Software and Applications Conference, 2003. COMPSAC 2003. Proceedings. 27th Annual International Nov. 3-6, 2003, pp. 278-282.
Ontology-based Web crawler Ganesh, S.; Jayaraj, M.; Kalyan, V.; SrinivasaMurthy; Aghila, G.; Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on vol. 2, 2004 pp. 337-341 vol. 2.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Dynamic-content web crawling through traffic monitoring does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Dynamic-content web crawling through traffic monitoring, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic-content web crawling through traffic monitoring will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3704656

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.