Data processing: presentation processing of document – operator i – Presentation processing of document – Structured document
Reexamination Certificate
2007-04-12
2011-11-01
Hong, Stephen S. (Department: 2178)
Data processing: presentation processing of document, operator i
Presentation processing of document
Structured document
C707S999104, C707SE17123
Reexamination Certificate
active
08051372
ABSTRACT:
A system and method for automatically detecting and extracting semantically significant text from a HTML document associated with a plurality of HTML documents is disclosed. The method may include receiving a HTML document, parsing the HTML document into a parse tree, segmenting the parse tree into one or more segments of one or more unique paths, processing the one or more segments based at least the HTML document, and extracting one or more processed segments from the at least the HTML document based on a predetermined number.
REFERENCES:
patent: 6631373 (2003-10-01), Otani et al.
patent: 6965900 (2005-11-01), Srinivasa et al.
patent: 2003/0063134 (2003-04-01), Lord et al.
patent: 2003/0115188 (2003-06-01), Srinivasa et al.
patent: 2004/0029085 (2004-02-01), Hu et al.
patent: 2005/0038785 (2005-02-01), Agrawal et al.
patent: 2005/0066269 (2005-03-01), Wang et al.
patent: 2005/0192983 (2005-09-01), Hattori et al.
patent: 2005/0216443 (2005-09-01), Morton et al.
patent: 2006/0047649 (2006-03-01), Liang
patent: 2010/0312728 (2010-12-01), Feng et al.
Information Extraction from HTML: Application of a General Machine Learning Approach, By Dayne Freitag, From: AAAI-98 Proceedings, 1998, pp. 1-7, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213.
IEPAD: Information Extraction Based on Pattern Discovery, By Chia-Hui Chang, Dept. of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan 320, pp. 681-688.
Penn, Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices, Lucent Bell Labs, Murray Hill, NJ 07974, pp. 1074-1078.
Mukherjee et al., Automatic Discovery of Semantic Structures in HTML Documents, State University of New York at Stoney Brook, Stoney Brook, NY, published 2003, pp. 1-5.
Penn et al., Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices, Lucent Bell Labs, Language Modeling Research, Murray Hill, NJ 07974, USA, published 2001, pp. 1074-1078.
Hong Stephen S.
Hunton & Williams LLP
Nazar Ahamed I
The New York Times Company
LandOfFree
System and method for automatically detecting and extracting... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for automatically detecting and extracting..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for automatically detecting and extracting... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4310504