Data processing: speech signal processing – linguistics – language – Linguistics – Multilingual or national language support
Reexamination Certificate
2003-05-30
2009-02-17
Knepper, David D (Department: 2626)
Data processing: speech signal processing, linguistics, language
Linguistics
Multilingual or national language support
C704S257000, C704S009000
Reexamination Certificate
active
07493251
ABSTRACT:
A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.
REFERENCES:
patent: 5576954 (1996-11-01), Driscoll
patent: 6052657 (2000-04-01), Yamron et al.
patent: 6311152 (2001-10-01), Bai et al.
patent: 6374210 (2002-04-01), Chu
patent: 7124080 (2006-10-01), Chen et al.
Cheng, Kowk-Shing, Gilbert H. Yong and Kam-Fai Wong, 1999, “A study on word-based and integral-bit Chinese text compression algorithms,” JASIS, 50(3) :218-228.
Chien, Lee-Feng, 1997 PAT-tree-based keyword extraction for Chinese information retrieval, In SIGIR97, 27-31.
Dai, Yubin, Christopher S. G. Khoo and Tech Ee Loh, 1999, “A new statistical formula for Chinese word segmentation incorporating contextual information,” SIGIR99, 82-89.
Gao, Jianfeng, Joshua Goodman, Mingjing Li and Kai-Fu Lee, 2002, “Toward a Unified Approach to Statistical Language Modeling for Chinese,”ACM TALIP, 1(1) :3-33.
Lin, Ming-Yu, Tung-Hui Chiang and Keh-Yi Su, 1993, “A preliminary study on unknown word problem in Chinese word segmentation,” ROCLING 6, 119-141.
Sproat, Richard and Chilin Shih, 2002, “Corpus-Based Methods in Chinese Morphology and Phonology,” In:COOLING 2002.
Sproat, Richard, Chilin Shih, William Gale and Nancy Chang, 1996, “A Stochastic Finite-State Word-Segmentation Algorithm for Chinese,”Computational Linguistics, 22(3) : 377-404.
Sun, Jian, Jianfeng Gao, Lei Zhang, Ming Zhou and Chang-Ning Huang, 2002, “Chinese Named Entity Identification Using Class-Based Language Model,” In:COLING 2002.
Teahan, W.J., Yingying Wen, Rodger McNad and Ian Witten, 2002, “A Compression-Based Algorithm for Chinese Word Segmentation,”Computational Linguistics, 26(3) : 375-393.
Wu, Zimin and Gwyneth Tseng, 1993, Chinese text segmentation for text retrieval achievements and problems, JASIS, 44 (9) : 532-542.
Katz, S.M., 1987, Estimation of probabilities from sparse data for language model component of a speech recognizer, IEEE ASSP 35 (3) : 400-401.
Table of Contents of the Coling 2002 Conference Aug. 24-Sep. 1, 2002, http://portal.acm.org, 18 pages, 2008.
Coling 2002 Conference, http://www.coling2002.sinica.edu.tw/2/26/2008. 1 Page.
Gao Jianfeng
Huang Chang-Ning
Li Mu
Sun Jian
Zhang Lei
Knepper David D
Magee Thomas M.
Microsoft Corporation
Westman, Champlin & Kelly, P. A.
LandOfFree
Using source-channel models for word segmentation does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Using source-channel models for word segmentation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Using source-channel models for word segmentation will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4096297