Data processing: artificial intelligence – Machine learning
Reexamination Certificate
2006-05-31
2009-02-17
Holmes, Michael B (Department: 2129)
Data processing: artificial intelligence
Machine learning
Reexamination Certificate
active
07493293
ABSTRACT:
A document (or multiple documents) is analyzed to identify entities of interest within that document. This is accomplished by constructing n-gram or bi-gram models that correspond to different kinds of text entities, such as chemistry-related words and generic English words. The models can be constructed from training text selected to reflect a particular kind of text entity. The document is tokenized, and the tokens are run against the models to determine, for each token, which kind of text entity is most likely to be associated with that token. The entities of interest in the document can then be annotated accordingly.
REFERENCES:
patent: 5418951 (1995-05-01), Damashek
patent: 5752051 (1998-05-01), Cohen
patent: 5845049 (1998-12-01), Wu
patent: 5949961 (1999-09-01), Sharman
patent: 5970453 (1999-10-01), Sharman
patent: 5983180 (1999-11-01), Robinson
patent: 6047251 (2000-04-01), Pon et al.
patent: 6098035 (2000-08-01), Yamamoto et al.
patent: 6167398 (2000-12-01), Wyard et al.
patent: 6169969 (2001-01-01), Cohen
patent: 6178396 (2001-01-01), Ushioda
patent: 6311152 (2001-10-01), Bai et al.
patent: 6314399 (2001-11-01), Deligne et al.
patent: 6415248 (2002-07-01), Bangalore et al.
patent: 6574597 (2003-06-01), Mohri et al.
patent: 6636636 (2003-10-01), Takasu
patent: 6785651 (2004-08-01), Wang
patent: 6865528 (2005-03-01), Huang et al.
patent: 7013264 (2006-03-01), Dolan et al.
patent: 7013265 (2006-03-01), Huang et al.
patent: 7016830 (2006-03-01), Huang et al.
patent: 7031908 (2006-04-01), Huang et al.
patent: 7046847 (2006-05-01), Hurst et al.
patent: 7050964 (2006-05-01), Menzes et al.
patent: 7113903 (2006-09-01), Riccardi et al.
patent: 7129932 (2006-10-01), Klarlund et al.
patent: 7143091 (2006-11-01), Charnock et al.
patent: 7171350 (2007-01-01), Lin et al.
patent: 7200559 (2007-04-01), Wang
patent: 7206735 (2007-04-01), Menezes et al.
patent: 7260568 (2007-08-01), Zhang et al.
patent: 7286978 (2007-10-01), Huang et al.
patent: 7321854 (2008-01-01), Sharma et al.
patent: 7340388 (2008-03-01), Soricut et al.
patent: 7343624 (2008-03-01), Rihn et al.
patent: 7346507 (2008-03-01), Natarajan et al.
patent: 7373291 (2008-05-01), Garst
patent: 7398211 (2008-07-01), Wang
patent: 2002/0099536 (2002-07-01), Borcher et al.
patent: 2004/0042667 (2004-03-01), Lee et al.
patent: 2004/0044952 (2004-03-01), Jiang et al.
patent: 2004/0143574 (2004-07-01), Nakamura et al.
Investigating linguistic knowledge in a maximum entropy token-based language model Jia Cui; Yi Su; Hall, K.; Jelinek, F.; Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Dec. 9-13, 2007 pp. 171-176 Digital Object Identifier 10.1109/ASRU.2007.4430104.
A state-space method for language modeling Siivola, V.; Honkela, A.; Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on Nov. 30-Dec. 3, 2003 pp. 548-553 Digital Object Identifier 10.1109/ASRU.2003.1318499.
Improving letter-to-sound conversion performance with automatically generated new words Jia-Li You; Yi-Ning Chen; Soong, F.K.; Jin-Lin Wang; Acoustic, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on Mar. 31, 2008-Apr. 4, 2008 pp. 4653-4656 Digital Object Identifier 10.1109/ICASSP.2008.4518694.
The use of a linguistically motivated language model in conversational speech recognition Wen Wang; Stolcke, A.; Harper, M.P.; Acoustic, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on vol. 1, May 17-21, 2004 pp. I-261-4 vol. 1 Digital Object Identifier 10.1109/ICASSP.2004.1325972.
Japanese document recognition based on interpolated n-gram model of character Mori, H.; Aso, H.; Makino, S.; Docuemnt Analysis and Recognition, 1995., Proceedings of the Third International Conference on vol. 1, Aug. 14-16, 1995 pp. 274-277 vol. 1 Digital Object Identifier 10.1109/ICDAR.1995.598993.
A multispan language modeling framework for large vocabulary speech recognition Bellegarda, J.R.; Speech and Audio Processing, IEEE Transactions on vol. 6, Issue 5, Sep. 1998 pp. 456-467 Digital Object Identifier 10.1109/89.709671.
Chinese Keyword Extraction Based on N-Gram and Word Co-occurrence Hui Jiao; Qian Liu; Hui-bo Jia; Computational Intelligence and Security Workshops, 2007. CISW 2007. International Conference on Dec. 15-19, 2007 pp. 152-155 Digital Object Identifier 10.1109/CISW.2007.4425468.
A multi-level text mining method to extract biological relationships Palakal, M.; Stephens, M.; Mukhopadhyay, S.; Raje, R.; Rhodes, S.; Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society Aug. 14-16, 2002 pp. 97-108 Digital Object Identifier 10.1109/CSB.2002.1039333.
Exploiting latent semantic information in statistical language modeling Bellegarda, J.R.; Proceedings of the IEEE vol. 88, Issue 8, Aug. 2000 pp. 1279-1296 Digital Object Identifier 10.1109/5.880084.
Association pattern language modeling Jen-Tzung Chien; Audio, Speech, and Language Processing, IEEE Transactions on vol. 14, Issue 5, Sep. 2006 pp. 1719-1728 Digital Object Identifier 10.1109/TSA.2005.858551.
Text Classification Improved through Automatically Extracted Sequences Dou Shen; Jian-Tao Sun; Qiang Yang; Hui Zhao; Zheng Chen; Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on Apr. 3-7, 2006 pp. 121-121 Digital Object Identifier 10.1109/ICDE.2006.158.
Statistical language models for on-line handwritten sentence recognition Quiniou, S.; Anquetil, E.; Carbonnel, S.; Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on Aug. 29-Sep. 1, 2005 pp. 516-520 vol. 1 Digital Object Identifier 10.1109/ICDAR.2005.220.
U. Bandara et al., “Fast Algorithm for evaluating word sequence statistics in large text corpora by small computers”, IBM Technical Disclosure Bulletin, vol. 32, No. 10B, Mar. 1990, pp. 268-270.
R. Kubota, “Lessening Index file for full text search”, IBM Technical Disclosure Bulletin, vol. 38, No. 11, Nov. 1995, p. 321.
Kanungo Tapas
Rhodes James J.
Holmes Michael B
International Business Machines - Corporation
Johnson Daniel E.
LandOfFree
System and method for extracting entities of interest from... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for extracting entities of interest from..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for extracting entities of interest from... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4065862