Data processing: presentation processing of document – operator i – Presentation processing of document – Structured document
Reexamination Certificate
2006-01-20
2010-02-23
Hong, Stephen S (Department: 2178)
Data processing: presentation processing of document, operator i
Presentation processing of document
Structured document
C707SE17107
Reexamination Certificate
active
07669119
ABSTRACT:
An extraction-rule generation and training system uses information obtained from multiple markup language documents (e.g. web pages) of similar structure to generate an extraction rule for extracting datapoints from markup language documents. By using information extracted from multiple documents of similar structure, including information regarding correlations between such documents, the method produces data extraction rules that provide improved datapoint extraction reliability. Where the structures of two or more documents are not sufficiently similar, the system maintains separate extraction rules for the same datapoint, and applies these separate extraction rules in combination to particular markup language documents to extract the datapoint.
REFERENCES:
patent: 6606625 (2003-08-01), Muslea et al.
patent: 6678681 (2004-01-01), Brin
patent: 6714941 (2004-03-01), Lerman et al.
patent: 6851089 (2005-02-01), Erickson et al.
patent: 6920609 (2005-07-01), Manber et al.
patent: 7505984 (2009-03-01), Nevill-Manning et al.
patent: 7593845 (2009-09-01), Ramsey
patent: 2002/0091688 (2002-07-01), Decary et al.
patent: 2002/0143659 (2002-10-01), Keezer et al.
patent: 2003/0177192 (2003-09-01), Umeki et al.
patent: 2008/0046441 (2008-02-01), Wen et al.
patent: 2008/0114800 (2008-05-01), Gazen et al.
patent: 2009/0125529 (2009-05-01), Vydiswaran et al.
Hogue, Andrew W. “Tree Pattern Inference and Matching for Wrapper Induction on the World Wide Web”. Massachusetts Institute of Technology, Jul. 20, 2004. http://dspace.mit.edu/handle/1721.1/28406 (unprotected version retrieved from personal web site of author, secondthought.org).
Chuang, Shui-Lung et al. “Tree-Strucutred Template Generation for Web Pages” IEEE Computer Society, Sep. 2004.
Zhai, Yanhong et al. “Web Data Extraction Based on Partial Tree Alignment” ACM, May 2005.
Han, Hui et al. “Rule-based Word Clustering for Document Metadata Extraction” ACM, Mar. 2005.
Chidlovskii, Boris. “Automatic Repairing of Web Wrappers by Combining Redundant Views”. 2002, IEEE Computer Society.
Liu, Ling et al. “XWRAP: An XML-enabled Wrapper Construction System for Web Information Sources”. Mar. 2000, IEEE.
Meng, Xiaofeng et al. “A Supervised Visual Wrapper Generator for Web-Data Extraction”. 2003, IEEE Computer Society.
Raposo, Juan et al. “Automatic Wrapper Maintenance for Semi-Strucutred Web Sources Using Results from Previous Queries”. 2005, Assocation for Computing Machinery.
Hogue, A. and Karger, D., “Thresher: automating the unwrapping of semantic content from the World Wide Web,” Proceedings of the 14th international conference on World Wide Web, pp. 86-95, ACM Press, May 2005.
Meng, X. et al., “Schema-guided wrapper maintenance for web-data extraction,” Proceedings of the 5th ACM international workshop on Web information and data management, pp. 1-8, ACM Press, 2003.
Ying Zhao and George Karypis, “Evaluation of hierarchical clustering algorithms for document datasets,” Proceedings of the 11th International Conference on Information and Knowledge Management, pp. 515-524, ACM Press, 2002.
Zamir, O, Etzioni, O, Madani, O and Karp, R., “Fast and intuitive Clustering of Web Documents,” Department of Computer Science & Engineering, American Association for Artificial Intelligence, 4 pages, (1997).
Jaenicke August A.
Orelind Greger J.
Alexa Internet
Hong Stephen S
Knobbe Martens Olson & Bear LLP
Schallhorn Tyler J
LandOfFree
Correlation-based information extraction from markup... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Correlation-based information extraction from markup..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Correlation-based information extraction from markup... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4175185