Inverted indices in information extraction to improve...

Data processing: database and file management or data structures – Database and file access – Record – file – and data search and comparisons

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S803000

Reexamination Certificate

active

08010544

ABSTRACT:
A method is provided for information extraction from among a multiplicity of documents each having a corresponding document object model (DOM) comprising: computing signatures associated with nodes of a multiplicity of DOMs corresponding to the multiplicity of documents; producing an index that associates computed signatures to each document that has a DOM that has one or more nodes corresponding to such signature; annotating one or more nodes of a DOM that corresponds to the at least one selected document; wherein the one or more annotated nodes respectively correspond to one or more respective signatures included in the index; and matching the signatures that correspond to the annotated nodes with signatures in the index to determine which documents from the multiplicity of documents have one or more DOM nodes that correspond to one or more of the annotated nodes.

REFERENCES:
patent: 6233575 (2001-05-01), Agrawal et al.
patent: 7401071 (2008-07-01), Hattori et al.
patent: 7478100 (2009-01-01), Murthy et al.
patent: 7657555 (2010-02-01), Rorex et al.
patent: 7721192 (2010-05-01), Milic-Frayling et al.
patent: 7765236 (2010-07-01), Zhai et al.
patent: 7783642 (2010-08-01), Feng et al.
patent: 7917480 (2011-03-01), Dean et al.
patent: 2005/0055334 (2005-03-01), Krishnamurthy
patent: 2008/0098300 (2008-04-01), Corrales et al.
Vydiswaran, V.G. Vinod, et al., U.S. Appl. No. 11/938,736, filed Nov. 12, 2207, entitled “Extracting Information Based on Document Structure and Characteristics of Attributes,” not yet published.
Chitrapura, Krishna, et al., U.S. Appl. No. 11/838,351, filed Aug. 14, 2007, entitled “Method for Organizing Structurally Similar Web Pages from a Web Site,” not yet published.
Le Hégaret, Philippe, et al., “What is the Document Object Model?,” http://www.w3.org/TR/DOM-Level-2-Core/introduction.html, dated Nov. 13, 2000.
Kushmerick, Nicholas, et al., “Wrapper Induction for Information Extraction,” Intl. Joint Conference on Artificial Intelligence (IJCAI) 1997.
Hong, Mingcai, et al., “Semantic Annotation using Horizontal and Vertical Contexts,” First Asian Semantic Web Conference, Beijing, China, Sep. 3-7, 2006, pp. 58-64.
Mukherjee, Saikat, et al., “Automatic Discovery of Semantic Structures in HTML Documents,” Proceedings of the Seventh International Conference on Document Analysis and Recognition 2003.
Reeve, Lawrence, et al., “Survey of Semantic Annotation Platforms,” 20th Annual ACM Symposium on Applied Computing, Santa Fe, New Mexico, Mar. 13 -17, 2005.
Lin, QingFeng, et al., A Machine Learning Framework for Automatically Annotating Web Pages with Simple HTML Ontology Extension (SHOE), International Conference on Intelligent Agents, Web Technologies and Internet Commerce (IAWTIC), Las Vegas, Nevada, 2001.
Senellart, P., et al., “Using domain Knowledge to Extract Wrappers for Tree-Structured Documents,” Ranked XML Querying Dagstuhl Seminar, Wadern, Germany, Mar. 13, 2008.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Inverted indices in information extraction to improve... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Inverted indices in information extraction to improve..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Inverted indices in information extraction to improve... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2772308

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.