Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2006-11-28
2006-11-28
Jung, David (Department: 2134)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
07143088
ABSTRACT:
A dynamic-content web crawler is disclosed. These New Crawlers (NCs) are located at points between the server and user, and monitor content from said points, for example by proxying the web traffic or sniffing the traffic as it goes by. Web page content is recursively parsed into subcomponents. Sub-components are fingerpinted with a cyclic redundancy check code or other loss-full compression in order to be able to detect recurrence of the sub-component in subsequent pages. Those sub-components which persist in the web traffic, as measured by the frequency NCs (6) are defined as having substantive content of interest to data-mining applications. Where a substantive content sub-component is added to or removed from a web page, then this change is significant and is sent to a duplication filter (11) so that if multiple NCs (6) detect a change in a web page only one announcement of the changed URL will be broadcast to data-mining applications (8). The NC (6) identifies substantive content sub-components which repeatably are part of a page pointed to by a URL. Provision is also made for limiting monitoring to pages having a flag authorizing discovery of the page by a monitor.
REFERENCES:
patent: 5987471 (1999-11-01), Bodine et al.
patent: 5987515 (1999-11-01), Ratcliff et al.
patent: 6061682 (2000-05-01), Agrawal et al.
An efficient scheme to remove crawler traffic from the Internet Yuan, X.; MacGregor, M.H.; Harms, J.; Computer Communications and Networks, 2002. Proceedings. Eleventh International Conference on Oct. 14-16, 2002 pp. 90-95.
A probabilistic model for intelligent Web crawlers Ke Hu; Wing Shing Wong; Computer Software and Applications Conference, 2003. COMPSAC 2003. Proceedings. 27th Annual International Nov. 3-6, 2003, pp. 278-282.
Ontology-based Web crawler Ganesh, S.; Jayaraj, M.; Kalyan, V.; SrinivasaMurthy; Aghila, G.; Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on vol. 2, 2004 pp. 337-341 vol. 2.
Green Jacob
Schultz John
Jung David
The Johns Hopkins University
Whitham Curtis Christofferson & Cook PC
LandOfFree
Dynamic-content web crawling through traffic monitoring does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Dynamic-content web crawling through traffic monitoring, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic-content web crawling through traffic monitoring will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3704656