Segmentation of strings into structured records

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

07627567

ABSTRACT:
An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.

REFERENCES:
patent: 5095432 (1992-03-01), Reed
patent: 2006/0235811 (2006-10-01), Fairweather
Borkar et al., Automatic segmentation of text into structured records, 2001 http://www.it.iitb.ac.in/˜sunita/papers/sigmod01.pdf.
Rie Kubota Ando, “Mostly-Unsupervised Statistical Segmentation of Japanese Sequences”, Feb. 11, 2002, pp. 1-2 http://www.cs.cornell.edu/home/Ilee/papers/segmentjnle.pdf.
E. Marsh and D. Perzanowski. MUC-7 Evaluation of IE Technology: Overview of Results, Proceedings of the 7th Message Understanding Conference (MUC-7). Morgan Kaufman, Apr. 29, 1998.
B. Adelberg. NoDoSE—A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. Sigmod, 1998 Seattle, WA, USA.
A. Arasu and H. Garcia-Molina. Extracting Structured Data from Web Pages. SIGMOD 2003, Jun. 9-12, 2003, San Diego, CA.
R. Baumgartner, S. Flesca, and G. Gottlob. Visual Web Information Extraction with Lixto. Proceedings of the 27thVLDB Conference, Roma, Italy, 2001.
J. Bilmes. What HMMs can do. UWEE Technical report, UWEETR-2002-2003, 2002.
M. E. Califf and R. J. Mooney. Relational Learning of Pattern-Match Rules for Information Extraction. In Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp. 6-11, Menlo Park, CA, 1998. AAAAI Press.
S. F. Chen and J. Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. In Proceedings of the 34th Annual Meeting of the ACL, pp. 310-318, Jun. 1996.
M. Collins and Y. Singer. Unsupervised Models for Named Entity Classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1999, pp. 100-110.
V. Crescenzi, G. Mecca, and P. Merialdo. RoadRunner: Towards Automatic Data Extraction From Large Web Sites. Proceedings of the 27thVLDB Conference, Roma, Italy, 2001.
J. Droppo, L. Deng, and A. Acero. Evaluation of the Splice on the Aurora 2 and 3 Tasks. Microsoft Research, pp. 29-32.
S. Fine, Y. Singer, and N. Tishby. The Hierarchical Hidden Markov Model: Analysis and Applications. Machine Learning, 32(1):41-62, 1998.
D. Freitag and A. McCallum. Information Extraction with HMM Structures Learned by Stochastic Optimization. In AAAI/IAAI, pp. 584-589, Copyright 2001.
R. Grishman. Information Extraction: Techniques and Challenges. In Information Extraction (International Summer School SCIE-97). Springer-Verlag, 1997.
M. A. Hernandez and S. J. Stolfo. Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge Discovery, 2(1):9-37, 1998. © 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
C. A. Knoblock, K. Lerman, S. Minton, and I. Muslea. Accurately and Reliably Extracting Data From the Web: A Machine Learning Approach. IEEE Data Engineering Bulletin, 23(4):33-41, 2000. Copyright 1999 IEEE.
M. Lapata. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Jul. 2003, pp. 545-552.
A. F. Martin and M. A. Przybocki. NIST 2003 Language Recognition Evaluation. In Eurospeech 2003, 2003.
A. Mikheev, M. Moens, and C. Grover. Named Entity Recognition Without Gazetteers. In Proceedings of EACL, 1999.
L. R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, vol. 77 No. 2, Feb. 1989. pp. 257-286.
K. Seymore, A. McCallum, and R. Rosenfeld. Learning Hidden Markov Model Structure for Information Extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Segmentation of strings into structured records does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Segmentation of strings into structured records, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Segmentation of strings into structured records will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4149484

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.