Stacked generalization learning for document annotation

Data processing: artificial intelligence – Machine learning

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S020000

Reexamination Certificate

active

07890438

ABSTRACT:
A document annotation method includes modeling data elements of an input document and dependencies between the data elements as a dependency network. Static features of at least some of the data elements are defined, each expressing a relationship between a characteristic of the data element and its label. Dynamic features are defined which define links between an element and labels of the element and of a second element. Parameters of a collective probabilistic model for the document are learned, each expressing a conditional probability that a first data element should be labeled with information derived from a label of a neighbor data element linked to the first data element by a dynamic feature. The learning includes decomposing a globally trained model into a set of local learning models. The local learning models each employ static features to generate estimations of the neighbor element labels for at least one of the data elements.

REFERENCES:
patent: 7165216 (2007-01-01), Chidlovskii et al.
patent: 7730396 (2010-06-01), Chidlovskii et al.
patent: 7756800 (2010-07-01), Chidlovskii
patent: 2004/0111253 (2004-06-01), Luo et al.
patent: 2004/0205482 (2004-10-01), Basu et al.
patent: 2006/0101058 (2006-05-01), Chidlovskii
patent: 2006/0112190 (2006-05-01), Hulten et al.
patent: 2006/0212142 (2006-09-01), Madani et al.
Automating the dispute resolution in a task dependency network, Letia, I.A.; Groza, A.; Intelligent Agent Technology, IEEE/WIC/ACM International Conference on Digital Object Identifier: 10.1109/IAT.2005.47 Publication Year: 2005 , pp. 365-371.
Power quality detection with classification enhancible wavelet-probabilistic network in a power system, Lin, C.-H.; Tsao, M.-C.; Generation, Transmission and Distribution, IEE Proceedings-vol. 152 , Issue: 6 Digital Object Identifier: 10.1049/ip-gtd:20045177 Publication Year: 2005 , pp. 969-976.
Learning acyclic decision trees with Functional Dependency Network and MDL Genetic Programming, Wing-Ho Shum; Kwong-Sak Leung; Man-Leung Wong; Computing in the Global Information Technology, 2006. ICCGI '06. International Multi-Conference on Digital Object Identifier: 10.1109/ICCG1.2006.46 Publication Year: 2006 , pp. 25-25.
Adaptive multiple fault detection and alarm processing for loop system with probabilistic network, Whei-Min Lin; Chia-Hung Lin; Zheng-Chi Sun; Power Delivery, IEEE Transactions on vol. 19 , Issue: 1 Digital Object Identifier: 10.1109/TPWRD.2003.820203 Publication Year: 2004 , pp. 64-69.
A.L.Berger, S.D.Pietra, V.J.D.Pietra, A Maximum Entropy Approach to Natural Language Processing,Computational Linguistics, 22(1):39-71, 1996.
Y.Altun, D.McAllester, M.Belkin, Maximum Margin Semi-Supervised Learning for Structured Variables,Proc. NIPS, 2005.
B.Chidlovskii, J.Fuselier, L.Lecerf, ALDAI: Active Learning Documents Annotation Interface,Proc. ACM Symposium on Document Engineering, 2006.
B.Chidlovskii, L.Lecerf, Stacked Dependency Networks for Layout Document Structuring,SIGIR Information Retrieval and Graphical Model Workshop, Amsterdam, The Netherlands, Jul. 2007 (Abstract).
U.Brefeld, T.Scheffer, Semi-Supervised Learning for Structured Output Variables,Proc. ICML, pp. 145-152, 2006.
D.H.Wolpert, Stacked Generalization,Neural Networks, 5(2):241-259, 1992.
U.S. Appl. No. 10/986,490, filed Nov. 10, 2004, Chidlovskii.
U.S. Appl. No. 11/032,814, filed Jan. 10, 2005, Dejean, et al.
U.S. Appl. No. 11/032,817, filed Jan. 10, 2005, Dejean, et al.
U.S. Appl. No. 11/116,100, filed Apr. 27, 2005, Dejean, et al.
U.S. Appl. No. 11/137,566, filed May 26, 2005, Meunier.
U.S. Appl. No. 11/156,776, filed Jun. 20, 2005, Chidlovskii, et al.
U.S. Appl. No. 11/170,542, filed Jun. 29, 2005, Chidlovskii, et al.
U.S. Appl. No. 11/222,881, filed Sep. 9, 2005, Bergholz.
D.H.Wolpert, W.G.Macready, Combining Stacking with bagging to Improve a Learning Algorithm,Santa Fe Institute Technical Report 96-03-1231996.
G.Qu, S.Hariri, M.Yousif, A New Dependency and Correlation Analysis for Features,Proc. ACM KDD, 2005.
J.Lafferty, A.McCallum, F.Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,ICML '01: Proc. 18thInt'l Conf. on Machine Learning, New York, ACM Press, 2001.
J.Neville, D.Jensen, Collective Classification with Relational Dependency Networks,Proc. ACM KDD, 2003.
J.Neville, D.Jensen, Dependency Networks for Relational Data,Proc. IEEE Data Mining, 2004.
J.P.Chanod, B.Chidlovskii, H.Dejean, O.Fambon, J.Fuselier, T.Jacquin, J.L.Meunier, From Legacy Documents to XML: A Conversion Framework,Proc. European Conf. Digital Libraries, pp. 92-103, 2005.
J.Ruppenhofer, M.Ellsworth, M.R.L.Petruck, C.R.Johnson, J.Scheffczyk,FrameNet II: Extended Theory and Practice, Jun. 6, 2006, http://framenet.icsi.berkeley.edu/book/book.html.
L.Lecerf, B.Chidlovskii, Document Annotation by Active Learning Techniques,Proc. ACM Symposium on Document Engineering, 2006.
L.R.Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,Proc. IEEE, 77(2):257-286, 1989.
M.I.Jordan, Z.Ghahramani, T.S.Jaakkola, L.K.Saul, An Introduction to Variational Methods for Graphical Models,Machine Learning, 37(2):183-233, 1999.
P.Melville, R.J.Mooney, Diverse Ensembles for Active Learning,ICML '04: Proc. 21stInt'l Conf. on Machine Learning, New York, pp. 74, ACM Press, 2004.
D.Roth, K.Small, Margin-Based Active Learning for Structured Output Spaces,Proc. ECML, 2006.
W.W.Cohen, V.R.Carvalho, Stacked Sequential Learning,Proc. of the IJCAI, 2005.
http://crfpp.sourceforge.net/. Oct. 15, 2007.
http://chasen.org/˜taku/software/. Oct. 24, 2007.
http://www.vikef.net/. Oct. 15, 2007.
Bishop, et al.Pattern Recognition and Machine learning(Information Science and Statistics), Oct. 2007 (Abstract only).
Kou, et al. “Stacked Graphical Models for Efficient Inference in Markov Random Fields,” in Proceedings of the 2007 SIAM International Conference on Data Mining, Apr. 2007.
Heckerman, et al. “Dependency Networks for Inference, Collaborative Filtering, and Data Visualization,” inJournal of Machine Learning Research, 1: 49-75, 2000.
Neville, et al. “Relational Dependency Networks,”Journal of Machine Learning Research8: 653-692, 2007.
Feng, et al. “Exploring the Use of Conditional Random Field Models and HMMs for Historical Handwritten Document Recognition,” Proceedings of the Second International Conference on Document Image Analysis for Libraries, 2006.
Jensen, et al. “Blocking Gibbs Sampling in Very Large Probabilistic Expert Systems,” International Journal of Human Computer Studies: Special Issue on Real-World Applications of Uncertain Reasoning, Oct. 1993.
Kopec, et al. “Document Image Decoding Using Markov Source Models,” IEEE Trans. Pattern Anal. Mach.Intell., 16 (1994),6, 602-617.
Liang, et al. “Efficient Geometric Algorithms for Parsing in Two Dimensions,” Int. Conference on Documents Analysis and Recognition, 2005.
Mao, et al. “Document Structure Analysis Algorithms: A Literature Survey,” Proc. SPIE Electronic Imaging, vol. 5010, 197-203, 2003.
Shetty, et al. “Segmentation and Labeling of Documents using Conditional Random Fields,” Proc. Document Recognition and Retrieval IV, Proceedings of SPIE, pp. 6500U-1-11, Feb. 2007.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Stacked generalization learning for document annotation does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Stacked generalization learning for document annotation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Stacked generalization learning for document annotation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2645797

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.