Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Reexamination Certificate
2011-05-03
2011-05-03
Dorvil, Richemond (Department: 2626)
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
C704S001000, C704S010000
Reexamination Certificate
active
07937263
ABSTRACT:
The present invention pertains to a system and method for the tokenization of text. The featurizer may be configured to receive input text and convert the input text into tokens. According to one aspect of the invention, the tokens may include only one type of character, the characters selected from the group consisting of letters, numbers, and punctuation. The tokenizer may also include a classifier. The classifier may be configured to receive the tokens from the featurizer. Furthermore, the classifier may be configured to analyze the tokens received from the featurizer to determine if the tokens may be input into a predetermined classification model using a preclassifier. If one of the tokens passes the preclassifier, then the token is classified using the predetermined classification model. Additionally, according to a first aspect of the invention, the tokenizer may also include a finalizer. The finalizer may be configured to receive the tokens and may be configured to produce a final output.
REFERENCES:
patent: 5423032 (1995-06-01), Byrd et al.
patent: 5555343 (1996-09-01), Luther
patent: 6327561 (2001-12-01), Smith et al.
patent: 6996529 (2006-02-01), Minnis
patent: 2002/0022956 (2002-02-01), Ukrainczyk et al.
patent: 2004/0148170 (2004-07-01), Acero et al.
patent: 2005/0065776 (2005-03-01), Coden et al.
patent: 2006/0116862 (2006-06-01), Carrier et al.
T. Strzalkowski and R. Brandow, A Natural Language Correction Model for Continuous Speech Recognition,Proceedings of the Fifth Workshop on Very Large Corpora, pp. 168-177, Aug. 1997; http://acl.idc.upenn.edu/W/W97/W97-0117.pdf.
M. Rayner et al., Combining Knowledge Sources to Reorder N-Best Speech Hypothesis List,Proceedings DARPA Speech and Natural Language Workshop, 1994; http://acl.ldc.upenn.edu/H/H94/H94-1040.pdf.
M. Ostendorf et al., Integration of Diverse Recognition Methodologies through Reevaluation of N-Best Sentence Hypotheses,Proceedings of DARPA and Natural Language Workshop, 1991; http://acl.ldc.upenn.edu/H/H91/H91-1013.pdf.
L. Norton et al., Recent Improvements and Benchmark Results for the Paramax ATIS System,Proceedings of DARPA Workshop on Speech and Natural Language, 1992; http://acl.ldc.upenn.edu/H/H92/H92-1017.pdf.
L. Hirschman, The Roles of Language Processing in a Spoken Language Interface,Voice Communication Between Humans and Machines, National Academy of Sciences, 1994, pp. 217-237; http://www.pnas.org/cgi/reprint/92/22/9970.
R. C. Moore, Integration of Speech with Natural Language Processing,Voice Communication Between Humans and Machines, National Academy of Sciences, 1994, pp. 254-271; http://www.pnas.org/cgi/reprint/92/22/9983.
J. Kupiec, Probabilistic Models of Short and Long Distance Word Dependencies in Running Text,Proceedings of DARPA Speech and Natural Language Workshop, 1992, pp. 290-295; http://acl.ldc.upenn.edu/H/H89/H89-1054.pdf.
H. Murveit and R. Moore, Integrating Natural Language Constraints into HMM-Based Speech Recognition, IEEE, 1990, pp. 573-576.
G. Maltese and F. Mancini, An Automatic Technique To Include Grammatical and Morphological Information in a Trigram-Based Statistical Language Model,IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992, pp. 157-160.
R. Schwartz et al., On Using Written Language Training Data for Spoken Language Modelling,Proceedings of Human Language Technology Workshop, Morhan Kaufmann Publishers, Inc., 1994, pp. 94-98; http://acl.ldc.upenn.edu/H/H94/H94-1016.pdf.
E. K. Ringger and J. F. Allen, Error Correction via a Post-Processor for Continuous Speech Recognition,In Proc. Of ICASSP-96, IEEE-96, 1996.
A. V. Aho, R. Sethi, and J. D. Ullman,Compilers, Addison-Wesley Publ. Co., 1986, 1988, Ch. 3, pp. 83-158.
G. Greffenstette and P. Tapanainen, What Is a Word, What Is a Sentence? Problems of Tokenization,3rdConference on Computational Lexicography and Text Research, COMPLEX '94, Budapest, Jul. 7-10, 1994.
M. D. Riley, Some Applications of Tree-Based Modelling To Speech and Language Indexing,Proceedings of the DARPA Speech and Natural Language Workshop, Morgan Kaufman, 1989, pp. 339-352.
D. D. Palmer, Satz—An Adaptive Sentence Segmentation System,M.S. Thesis and UC-Berkeley Technical Report UCB/CSD 94/846, University of California, Berkeley, Computer Science Division, 1994.
J. C. Reynar and A. Ratnaparkhi, A Maximum Entropy Approach to Identifying Sentence Boundaries,Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington D.C., 1997, pp. 16-19.
D. D. Palmer and M. A. Hearst, Adaptive Multilingual Sentence Boundary Disambiguation,Computational Linguistics, 23(2), 1997.
A. Mikheev, Tagging Sentence Boundaries,NACL 2000(Seattle) ACL Apr. 2000, pp. 264-271.
H. Schmid, Unsupervised Learning of Period Disambiguation for Tokenisation, Internal Report, IMS, University of Stuttgart, Apr. 2000.
D. Yarowsky, Homograph Disambiguation In Text-To-Speech Synthesis,Process In Speech Synthesis, J. van Santen, R. Sproat, J. Olive, and J. Hirschberg (eds.), Springer-Verlag, 1996, pp. 159-175.
D. Yarowsky, Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French,Proceedings of the 32ndAnnual Meeting of the Association for Computational Linguistics, Las Cruces, NM, 1994, pp. 88-95.
D. Yarowsky, A Comparison of Corpus-Based Techniques for Restoring Acents in Spanish and French Text,Proceedings, 2rdAnnual Workshop on Very Large Corpora, Kyoto, 1994, pp. 19-32.
R. Sproat, Multilingual Text Analysis for Text-To-Speech Synthesis,ECAI Workshop on Extended Finite-State Models of Language, Aug. 1996, 1996.
R. Sproat, A. W. Black, S. Chen, S. Kumar, M. Ostendorf, and C. Richards, Normalization of Non-Standard Words: WS '99 Final Report, Sep. 13, 1999, pp. 1-78, InComputer Speech and Language, 15(3), 2001, pp. 287-333.
Daelemans, Walter et al., TiMBL: Tilburg Memory-Based Learner Reference Guide, Nov. 26, 2003, pp. 1-51, Tilburg University and CNTS Research Group, University of Antwerp.
Carrier Jill
Carus Alwin B.
Cote William F.
Del La Femina Kathryn
Dowd John
Dictaphone Corporation
Dorvil Richemond
Godbold Douglas C
Wolf Greenfield & Sacks P.C.
LandOfFree
System and method for tokenization of text using classifier... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for tokenization of text using classifier..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for tokenization of text using classifier... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2668736