Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2004-04-14
2009-12-01
Pham, Khanh B (Department: 2166)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
07627567
ABSTRACT:
An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.
REFERENCES:
patent: 5095432 (1992-03-01), Reed
patent: 2006/0235811 (2006-10-01), Fairweather
Borkar et al., Automatic segmentation of text into structured records, 2001 http://www.it.iitb.ac.in/˜sunita/papers/sigmod01.pdf.
Rie Kubota Ando, “Mostly-Unsupervised Statistical Segmentation of Japanese Sequences”, Feb. 11, 2002, pp. 1-2 http://www.cs.cornell.edu/home/Ilee/papers/segmentjnle.pdf.
E. Marsh and D. Perzanowski. MUC-7 Evaluation of IE Technology: Overview of Results, Proceedings of the 7th Message Understanding Conference (MUC-7). Morgan Kaufman, Apr. 29, 1998.
B. Adelberg. NoDoSE—A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. Sigmod, 1998 Seattle, WA, USA.
A. Arasu and H. Garcia-Molina. Extracting Structured Data from Web Pages. SIGMOD 2003, Jun. 9-12, 2003, San Diego, CA.
R. Baumgartner, S. Flesca, and G. Gottlob. Visual Web Information Extraction with Lixto. Proceedings of the 27thVLDB Conference, Roma, Italy, 2001.
J. Bilmes. What HMMs can do. UWEE Technical report, UWEETR-2002-2003, 2002.
M. E. Califf and R. J. Mooney. Relational Learning of Pattern-Match Rules for Information Extraction. In Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp. 6-11, Menlo Park, CA, 1998. AAAAI Press.
S. F. Chen and J. Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. In Proceedings of the 34th Annual Meeting of the ACL, pp. 310-318, Jun. 1996.
M. Collins and Y. Singer. Unsupervised Models for Named Entity Classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1999, pp. 100-110.
V. Crescenzi, G. Mecca, and P. Merialdo. RoadRunner: Towards Automatic Data Extraction From Large Web Sites. Proceedings of the 27thVLDB Conference, Roma, Italy, 2001.
J. Droppo, L. Deng, and A. Acero. Evaluation of the Splice on the Aurora 2 and 3 Tasks. Microsoft Research, pp. 29-32.
S. Fine, Y. Singer, and N. Tishby. The Hierarchical Hidden Markov Model: Analysis and Applications. Machine Learning, 32(1):41-62, 1998.
D. Freitag and A. McCallum. Information Extraction with HMM Structures Learned by Stochastic Optimization. In AAAI/IAAI, pp. 584-589, Copyright 2001.
R. Grishman. Information Extraction: Techniques and Challenges. In Information Extraction (International Summer School SCIE-97). Springer-Verlag, 1997.
M. A. Hernandez and S. J. Stolfo. Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge Discovery, 2(1):9-37, 1998. © 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
C. A. Knoblock, K. Lerman, S. Minton, and I. Muslea. Accurately and Reliably Extracting Data From the Web: A Machine Learning Approach. IEEE Data Engineering Bulletin, 23(4):33-41, 2000. Copyright 1999 IEEE.
M. Lapata. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Jul. 2003, pp. 545-552.
A. F. Martin and M. A. Przybocki. NIST 2003 Language Recognition Evaluation. In Eurospeech 2003, 2003.
A. Mikheev, M. Moens, and C. Grover. Named Entity Recognition Without Gazetteers. In Proceedings of EACL, 1999.
L. R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, vol. 77 No. 2, Feb. 1989. pp. 257-286.
K. Seymore, A. McCallum, and R. Rosenfeld. Learning Hidden Markov Model Structure for Information Extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.
Agichtein Yevgeny
Ganti Venkatesh
Theodore Vassilakis
Johnson Johnese
Microsoft Corporation
Pham Khanh B
LandOfFree
Segmentation of strings into structured records does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Segmentation of strings into structured records, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Segmentation of strings into structured records will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4149484