System and method for automatically detecting and extracting...

Data processing: presentation processing of document – operator i – Presentation processing of document – Structured document

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S999104, C707SE17123

Reexamination Certificate

active

08051372

ABSTRACT:
A system and method for automatically detecting and extracting semantically significant text from a HTML document associated with a plurality of HTML documents is disclosed. The method may include receiving a HTML document, parsing the HTML document into a parse tree, segmenting the parse tree into one or more segments of one or more unique paths, processing the one or more segments based at least the HTML document, and extracting one or more processed segments from the at least the HTML document based on a predetermined number.

REFERENCES:
patent: 6631373 (2003-10-01), Otani et al.
patent: 6965900 (2005-11-01), Srinivasa et al.
patent: 2003/0063134 (2003-04-01), Lord et al.
patent: 2003/0115188 (2003-06-01), Srinivasa et al.
patent: 2004/0029085 (2004-02-01), Hu et al.
patent: 2005/0038785 (2005-02-01), Agrawal et al.
patent: 2005/0066269 (2005-03-01), Wang et al.
patent: 2005/0192983 (2005-09-01), Hattori et al.
patent: 2005/0216443 (2005-09-01), Morton et al.
patent: 2006/0047649 (2006-03-01), Liang
patent: 2010/0312728 (2010-12-01), Feng et al.
Information Extraction from HTML: Application of a General Machine Learning Approach, By Dayne Freitag, From: AAAI-98 Proceedings, 1998, pp. 1-7, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213.
IEPAD: Information Extraction Based on Pattern Discovery, By Chia-Hui Chang, Dept. of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan 320, pp. 681-688.
Penn, Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices, Lucent Bell Labs, Murray Hill, NJ 07974, pp. 1074-1078.
Mukherjee et al., Automatic Discovery of Semantic Structures in HTML Documents, State University of New York at Stoney Brook, Stoney Brook, NY, published 2003, pp. 1-5.
Penn et al., Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices, Lucent Bell Labs, Language Modeling Research, Murray Hill, NJ 07974, USA, published 2001, pp. 1074-1078.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for automatically detecting and extracting... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for automatically detecting and extracting..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for automatically detecting and extracting... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4310504

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.