Data processing: presentation processing of document – operator i – Presentation processing of document – Structured document
Reexamination Certificate
2002-05-28
2009-08-25
Hong, Stephen S (Department: 2178)
Data processing: presentation processing of document, operator i
Presentation processing of document
Structured document
Reexamination Certificate
active
07581170
ABSTRACT:
A method and a system for information extraction from Web pages formatted with markup languages such as HTML [8]. A method and system for interactively and visually describing information patterns of interest based on visualized sample Web pages [5,6,16-29]. A method and data structure for representing and storing these patterns [1]. A method and system for extracting information corresponding to a set of previously defined patterns from Web pages [2], and a method for transforming the extracted data into XML is described. Each pattern is defined via the (interactive) specification of one or more filters. Two or more filters for the same pattern contribute disjunctively to the pattern definition [3], that is, an actual pattern describes the set of all targets specified by any of its filters. A method and for extracting relevant elements from Web pages by interpreting and executing a previously defined wrapper program of the above form on an input Web page [9-14] and producing as output the extracted elements represented in a suitable data structure. A method and system for automatically translating said output into XML format by exploiting the hierarchical structure of the patterns and by using pattern names as XML tags is described.
REFERENCES:
patent: 5826258 (1998-10-01), Gupta et al.
patent: 5841895 (1998-11-01), Huffman
patent: 5860071 (1999-01-01), Ball et al.
patent: 5898836 (1999-04-01), Freivald et al.
patent: 5913214 (1999-06-01), Madnick et al.
patent: 5966516 (1999-10-01), De Palma et al.
patent: 5983268 (1999-11-01), Freivald et al.
patent: 6081804 (2000-06-01), Smith
patent: 6102969 (2000-08-01), Christianson et al.
patent: 6128655 (2000-10-01), Fields et al.
patent: 2002/0169771 (2002-11-01), Melmon et al.
TracerLock™ http://web.archive.org/web/20010504003134/http://peacefire.org/tracer/lock/.
Mind-it; printed Mar. 25, 2003 http://web.archive.org/web/20010602164634/www.netmind.com/index.shtml.
XWrap; screenshots.
Daniel F. Savarese OROMatcher 1.1 1997-2000 http://web.archive.org/web/20010603013706/http://www.savarese.org/oro/software/OR....
WebQL; http://web.archive.org/web/20010517194758/http://caesius.com/.
X-Fetch Suite—XML enabling and integrating information systems; Republica Corp, 2002 http://www.x-fetch.com/xhtml/summary.html.
WisoSoftCom InfoScanner MobileScanner Internet Data Mining for Portal http://web.archive.org/web/20010604082928/www.wisosoft.com/en/home.html.
Kapow Techonologies; 2001 SAL Holding, Internet Ventures Scandinavia, Impress Corp. http://web.archive.org/web/20010517005034/http://www.kapowtech.com.
Enabling e-Business with Business Process Integration; Orsus Solutions, Ltd. 2000, Sunnyvale, CA 94088 pp. 1-12.
Unlocking the Internet for Mobile Access/iGlue/Wireless for Integrating Web to Wireless Business Processes; Orsus Solutions, Ltd. 2000, Sunnyvale, CA 94088 pp. 1-15.
Michael Stonebraker and Joseph M. Hellerstein; Content Integration for E-Business; Hayward, CA 94545, pp. 1-9.
Gerald Huck, Peter Fankhauser, Karl Aberer and Erich Neuhold; Jedi: Extracting and Synthesizing Information from the Web; 64293 Darmstadt, Germany, pp. 1-10.
Wolfgang May and Georg Lausen; Information Extraction from the Web. Mar. 2000, 79110 Freiburg, Germany, pp. 1-48.
Giansalvatore Mecca and Paolo Atzeni; Cut and Paste, pp. 1-29.
David Konopnicki and Oded Shmueli; Information Gathering in the World-Wide Web: The W3QL Query Language and the W3QS System, Haifa 32000 Isarel, pp. 1-41.
Seung-Jin Lim Yiu-Kai Ng; WebView: A Tool for Retrieving Internal Structures and Extracting Information from HTML Documents, Provo UT 84602, pp. 1-19.
Fred Douglis, Thomas Ball, Yih-Farn Chen and Eleftherios Koutsofious; The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web, Jan. 1998, pp. 1-29.
Ling Liu, Calton Pu, and Wei Tang; Continual Queries for Internet Scale Event-Driven Information Delivery; Portland OR, 97291, pp. 1-30.
Nicholas Kushmerick, Daniel S. Weld and Robert Doorenbos; Wrapper Induction for Information Extraction, IJCAI-97, Seattle WA, 98195, pp. 1-7.
Chun-Nan Hsu and Ming-Tzung Dung; Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web, Information Systems vol. 23 No. 8, pp. 521-538, 1998, Great Britain.
Ion Muslea, Steven Minton and Craig A. Knoblock; Hierarchical Wrapper Induction for Semistructured Information Sources, Kluwer Academic Publishers, Netherlands, Sep. 10, 1999 pp. 1-28.
Brad Adelberg; NoDoSe—A tool for Semi-Automtically Extracting Structured and Semistructured Data from Text Documents, Evanston IL, 60201 pp. 1-25.
Stephen W. Liddle, Douglas M. Campbell and Chad Crawford; Automatically Extracting Structure and Data from Business Reports, Provo UT, 84602 pp. 1-23.
Scott B. Huffman; Learning information extraction patterns from examples, Menlo Park CA, 94025 Feb. 22, 2995, pp. 1-15.
D.W. Embley and L. Xu; Locating and Reconfiguring Records in Unstructured Multiple-Record Web Documents, Provo UT, 84602 pp. 1-20.
Hasan Davulcu, Guizhen Yang, Michael Kifer and I.V. Ramakrishnan; Computational Aspects of Resilient Data Extraction form Semistructured Sources, Stony Brook NY, 11794 Feb. 3, 2002, pp. 1-17.
Hasan Davulcu, Guizhen Yang, Michael Kifer and I.V. Ramakrishnan; Design and Implementation of the Physical Layer in WebBases: The XRover™ Experience, J. Lloyd et al. (Eds.): LNAI 1861, pp. 1904-1105, 2000. © Springer-Verlag Berlin Heidleberg 2000.
Naveen Ashish and Cragi A. Knoblock; Semi-automatic Wrapper Generation for Internet Information Sources, Marina del Rey CA, 90292 pp. 1-10.
Ling Liu, Calto PU and Wei Han; XWRAP: An SML-enabled Wrapper Construction System for Web Information Sources, Atlanta GA 30332 pp. 1-11.
Arnaud Sahuguet and Fabien Azavant; Building Intelligent Web Applications Using Lightweight Wrappers, Barrault Paris, 75634 Cedex 13, France, Jul. 11, 2000 pp. 1-37.
Berthier Ribeiro-Neto, Alberto H.F. Laender and Altigran S. Da Silva; Extracting Semi-Structured Data Through Examples, 31270-901 Belo Horizonte MG Brazil, pp. 1-8.
Jean-Robert Gruser, Louiqa Raschid, Maria Ester Vidal and Laura Bright; Wrapper Generation for Web Accessible Data Sources, College Park MD, 20742 pp. 1-10.
Robert Baumgartner , Sergio Flesca and Georg Gottlob; Visual Web Information Extraction with Lixto, Roma Italy, 2001 pp. 1-10.
Robert Baumgartner , Sergio Flesca and Georg Gottlob; The Elog Web Extraction Language, pp. 1-13.
J. Hammer, H. Garcia-Molina, J. Cho, R Aranha and A. Crespo; Extracting Semistructured Information from the Web, Stanford, CA 94305 pp. 1-8.
Alberto O. Mendelzon and Geroge A. Mihaila; Querying the World Wide Web, Mar. 20, 1996, pp. 1-23.
Baumgartner Robert
Gottlob Georg
Herzoo Marcus
I'Lesca Sergio
Hong Stephen S
Lixto Software GmbH
Stork Kyle R
Sughrue & Mion, PLLC
LandOfFree
Visual and interactive wrapper generation, automated... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Visual and interactive wrapper generation, automated..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Visual and interactive wrapper generation, automated... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4096280