Joint optimization of wrapper generation and template detection

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

07660804

ABSTRACT:
A method and system for generating wrappers for hierarchically organized documents by jointly optimizing template detection and wrapper generation is provided. A wrapper generation system generates a wrapper for documents with similar templates by identifying a cluster of document trees and generating a wrapper tree for the cluster. A wrapper tree defines the wrapper for documents that match the template of the cluster. The wrapper generation system clusters document trees by generating a wrapper tree for the cluster based on an initial document tree. The wrapper generation system then repeatedly determines whether any other document tree matches or nearly matches the wrapper tree for the cluster and, if so, adds the document tree to the cluster and adjusts the wrapper tree as appropriate so that all the document trees, including the newly added one, match the wrapper tree.

REFERENCES:
patent: 6135650 (2000-10-01), Goebel
patent: 6304870 (2001-10-01), Kushmerick et al.
patent: 6446061 (2002-09-01), Doerre et al.
patent: 6606625 (2003-08-01), Muslea et al.
patent: 6654739 (2003-11-01), Apte et al.
patent: 6757678 (2004-06-01), Myllymaki
patent: 6792576 (2004-09-01), Chidlovskii
patent: 6941558 (2005-09-01), Hill et al.
patent: 7203679 (2007-04-01), Agrawal et al.
patent: 2002/0174161 (2002-11-01), Scheetz et al.
patent: 2004/0093321 (2004-05-01), Roustant et al.
patent: 2004/0111400 (2004-06-01), Chevalier
patent: 2005/0022115 (2005-01-01), Baumgartner et al.
patent: 2005/0154979 (2005-07-01), Chidlovskii et al.
patent: 2007/0005589 (2007-01-01), Gollapudi
patent: 2008/0010292 (2008-01-01), Poola
patent: WO-2005/072072 (2006-04-01), None
patent: WO-2006/036376 (2006-04-01), None
PCT International Search Report under International Application No. PCT/US2007/018417, Mailing date Feb. 1, 2008, 3 pages.
Arasu, Arvind and Hector Garcia-Molina, “Extracting Structured Data from Web Pages,” SIGMOD 2003, San Diego, CA, © 2003 ACM, pp. 337-348.
Chang, Chia-Hui and Shao-Chen Lui, “IEPAD: Information Extraction Based on Pattern Discovery,” WWW10, May, Hong Kong, © 2001 ACM, pp. 681-688.
Chuang, Shui-Lung and Jane Yung-jen Hsu, “Tree-Structured Template Generation for Web Pages,” Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI'04), IEEE Computer Society, 7 pages, 2004.
Cohen, William W. et al., “A Flexible Learning System for Wrapping Tables and Lists in HTML Documents,” WW2002, Honolulu, Hawaii, ACM, pp. 232-241.
Crescenzi, Valter, Giansalvatore Mecca and Paolo Merialdo, “RoadRunner: TowardsAutomatic Data Extraction from Large Web Sites,” Proceedings of the 27th VLDB Conference, Italy, 2001, 10 pages.
Crescenzi, Valter, Giansalvatore Mecca and Paolo Merialdo, “Wrapping-Oriented Classification of Web Pages,” SAC 2002, Madrid, Spain, © 2002 ACM, pp. 1108-1112.
Flesca, Sergio et al., “Web wrapper induction: a brief survey,” Al Communications 17, 2004, © 2004 IOS Press, pp. 57-61.
Fukunaga, K., “Introduction to statistical pattern recognition,” Academic Press Inc., Boston, 1990, 32 pages.
Hammer, J. et al., “Extracting Semistructured Information from the Web,” In Proceedings of the Workshop on Management for Semistructured Data, 1997, 8 pages.
Hao, Yanan and Yanchun Zhang, “A Two-Phase Rule Generation and Optimization Approach for Wrapper Generation,” 17th Australasian Database Conference, 2006, © 2006 Australian Computer Society, Inc., 10 pages.
Hogue, Andrew and David Karger, “Thresher: Automating the Unwrapping of Semantic Content from the World Wide Web,” WWW2005, Chiba, Japan, ACM, pp. 86-95.
Hsu, Chun-Nan and Ming-Tzung Dung, “Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web,” Information Systems, vol. 23, No. 8, 1998, © 1998 Elsevier Science Ltd., pp. 521-538.
Irmak, Utku and Torsten Suel, “Interactive Wrapper Generation with Minimal User Effort,” WWW2006, Edinburgh, Scotland, 2006, ACM, 11 pages.
Kushmerick, Nicholas, Daniel S. Weld and Robert Doorenbos, “Wrapper Induction for Information Extraction,” IJCAI-97, 7 pages, 1997.
Laender, Alberto H. F. et al., “A Brief Survey of Web Data Extraction Tools,” SIGMOD Record, vol. 31, No. 2, Jun. 2002, pp. 84-93.
Liu, Bing and Kevin Chen-Chuan Chang, “Editorial: Special Issue on Web Content Mining,” SIGKDD Explorations, vol. 6, Issue 2, pp. 1-4, 2004.
Liu, Bing, “Web Content Mining,” WWW-2005 Tutorial, 14th International World Wide Web Conference (WWW-2005), May 2005, Chiba, Japan, 83 pages.
Liu, Bing, Robert Grossman & Yanhong Zhai, “Mining Data Records in Web Pages,” SIGKDD'03, Washington, DC, © 2003 ACM, pp. 601-606.
Liu, Ling, Calton Pu and Wei Han, “XRAP: An XML-enabled Wrapper Construction System for Web Information Sources,” Proceedings of the 16th International Conference on Data Engineering, 2000, pp. 611-621.
Muslea, Ion, Steve Minton and Craig Knoblock, “A Hierarchical Approach to Wrapper Induction,” Autonomous Agents'99, Seattle, Washington, © ACM 1999, pp. 190-197.
Reis, Davi de Castro et al., “Automatic Web News Extraction Using Tree Edit Distance,” WWW2004, May, New York, ACM, pp. 502-511.
Sarawagi, Sunita, “Automation in Information Extraction and Integration,” Tutorial, VLDB 2002, 58 pages.
Willett, Peter, “Recent Trends in Hierarchic Document Clustering: A Critical Review,” Information Processing & Management, vol. 24, No. 5, 1988, pp. 577-597.
Zhao, Hongkun et al., “Fully Automatic Wrapper Generation for Search Engines,” WWW 2005, Chiba, Japan, ACM, pp. 66-86.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Joint optimization of wrapper generation and template detection does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Joint optimization of wrapper generation and template detection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Joint optimization of wrapper generation and template detection will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4188272

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.