Data processing: presentation processing of document – operator i – Presentation processing of document – Hypermedia
Reexamination Certificate
2011-07-26
2011-07-26
Rutledge, Amelia (Department: 2176)
Data processing: presentation processing of document, operator i
Presentation processing of document
Hypermedia
Reexamination Certificate
active
07987417
ABSTRACT:
An improved system and method is provided for detecting a web page template. A web page template detector may be provided for performing page-level template detection on a web page. In general, the web page template classifier may be trained using automatically generated training data, and then the web page template classifier may be applied to web pages to identify web page templates. A web page template may be detected by classifying segments of a web page as template structures, by assigning classification scores to the segments of the web page classified as template structures, and then by smoothing the classification scores assigned to the segments of the web page. Generalized isotonic regression may be applied for smoothing scores associated with the nodes of a hierarchy by minimizing an optimization function using dynamic programming.
REFERENCES:
patent: 6256629 (2001-07-01), Sproat et al.
patent: 2004/0006452 (2004-01-01), Gluhovsky
patent: 2007/0009167 (2007-01-01), Dance et al.
patent: 2007/0255707 (2007-11-01), Tresser et al.
Niculescu-Mizil, et al., “Predicting Good Probabilities With Supervised Learning”, appearing in Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 2005, p. 625-632.
Angelov, et al., “Weighted Isotonic Regression under the L1 Norm”, SODA 2006, Jan. 2006, p. 783-791.
Yi et al., “Eliminating Noisy Information in Web Pages for Data Mining”, SIGKDD '03, Aug. 2003, copyright ACM, p. 1-10.
Z. Bar-Yossef and S. Rajagopalan, 'Template detection via data mining and its applications, in Proc. 11th WWW, pp. 580-591, 2002.
D. Gibson, K. Punera, and A. Tomkins, “The volume and evolution of web page templates,” in Proc. 14th WWW (Special Interest Tracks and Posters), pp. 830-839, May 2005.
L. Yi, B.Liu and X. Li, “Eliminating Noisy Information in web pages for data mining,” In Proc. 9th KDD, pp. 296-305, 2003.
L. Yi and B.Liu, “Web page cleaning for web mining through feature weighting,” In Proc. 18th IJCAI, pp. 43-50, 2003.
K. Vieira, A. Silva, N. Pinto, E. Moura, J. Cavalcanti, and J. Freire, “A fast and robust method for web page template detection and removal,” In Proc. 15th CIKM, pp. 256-267, 2006.
H.Y. Kao, J.M. Ho, and M.S. Chen, “WISDOM: Web intrapage informative structure mining based on document object model,” TKDE, 17(5):614-627, 2005.
S. Debnath, P. Mitra, N. Pal, and C.L. Giles, “Automatic Identification of Informative Sections of Web Pages,” TKDE, 17(9):1233-1246, 2005.
H. Y. Kao, M.S. Chen, S.H. Lin, and J.M Ho, “Entropy-based link analysis for mining web informative structures,” in Proc. 11th CIKM, pp. 574-581 2002.
R. Song, H. Liu, J.R. Wen, and W.Y. Ma, “Learning block importance models for web pages,” In Proc. 13th WWW, pp. 203-211, 2004.
B. Davison, “Recognizing nepotistic links on the web,” In AAA1-2000 Workshop on Artificial Intelligence for Web Search, pp. 23-28, 2000.
N. Kushmerick, “Learning to remove internet advertisement,” In Proc. 3rd Agents, pp. 175-181, 1999.
Chakrabarti Deepayan
Punera Kunal
Ravikumar Shanmugasundaram
Buchenhorner Patent Law
Rutledge Amelia
Yahoo ! Inc.
LandOfFree
System and method for detecting a web page template does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for detecting a web page template, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for detecting a web page template will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2699461