System and method for tokenization of text using classifier...

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details System and method for tokenization of text using classifier... System and method for tokenization of text using classifier...

: 2011-05-03
: 2011-05-03
: Dorvil, Richemond (Department: 2626)
: Data processing: speech signal processing, linguistics, language
: Linguistics
: Natural language

: C704S001000, C704S010000
: Reexamination Certificate
: active
: 07937263
: ABSTRACT:
The present invention pertains to a system and method for the tokenization of text. The featurizer may be configured to receive input text and convert the input text into tokens. According to one aspect of the invention, the tokens may include only one type of character, the characters selected from the group consisting of letters, numbers, and punctuation. The tokenizer may also include a classifier. The classifier may be configured to receive the tokens from the featurizer. Furthermore, the classifier may be configured to analyze the tokens received from the featurizer to determine if the tokens may be input into a predetermined classification model using a preclassifier. If one of the tokens passes the preclassifier, then the token is classified using the predetermined classification model. Additionally, according to a first aspect of the invention, the tokenizer may also include a finalizer. The finalizer may be configured to receive the tokens and may be configured to produce a final output.

REFERENCES:
patent: 5423032 (1995-06-01), Byrd et al.
patent: 5555343 (1996-09-01), Luther
patent: 6327561 (2001-12-01), Smith et al.
patent: 6996529 (2006-02-01), Minnis
patent: 2002/0022956 (2002-02-01), Ukrainczyk et al.
patent: 2004/0148170 (2004-07-01), Acero et al.
patent: 2005/0065776 (2005-03-01), Coden et al.
patent: 2006/0116862 (2006-06-01), Carrier et al.
T. Strzalkowski and R. Brandow, A Natural Language Correction Model for Continuous Speech Recognition,Proceedings of the Fifth Workshop on Very Large Corpora, pp. 168-177, Aug. 1997; http://acl.idc.upenn.edu/W/W97/W97-0117.pdf.
M. Rayner et al., Combining Knowledge Sources to Reorder N-Best Speech Hypothesis List,Proceedings DARPA Speech and Natural Language Workshop, 1994; http://acl.ldc.upenn.edu/H/H94/H94-1040.pdf.
M. Ostendorf et al., Integration of Diverse Recognition Methodologies through Reevaluation of N-Best Sentence Hypotheses,Proceedings of DARPA and Natural Language Workshop, 1991; http://acl.ldc.upenn.edu/H/H91/H91-1013.pdf.
L. Norton et al., Recent Improvements and Benchmark Results for the Paramax ATIS System,Proceedings of DARPA Workshop on Speech and Natural Language, 1992; http://acl.ldc.upenn.edu/H/H92/H92-1017.pdf.
L. Hirschman, The Roles of Language Processing in a Spoken Language Interface,Voice Communication Between Humans and Machines, National Academy of Sciences, 1994, pp. 217-237; http://www.pnas.org/cgi/reprint/92/22/9970.
R. C. Moore, Integration of Speech with Natural Language Processing,Voice Communication Between Humans and Machines, National Academy of Sciences, 1994, pp. 254-271; http://www.pnas.org/cgi/reprint/92/22/9983.
J. Kupiec, Probabilistic Models of Short and Long Distance Word Dependencies in Running Text,Proceedings of DARPA Speech and Natural Language Workshop, 1992, pp. 290-295; http://acl.ldc.upenn.edu/H/H89/H89-1054.pdf.
H. Murveit and R. Moore, Integrating Natural Language Constraints into HMM-Based Speech Recognition, IEEE, 1990, pp. 573-576.
G. Maltese and F. Mancini, An Automatic Technique To Include Grammatical and Morphological Information in a Trigram-Based Statistical Language Model,IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992, pp. 157-160.
R. Schwartz et al., On Using Written Language Training Data for Spoken Language Modelling,Proceedings of Human Language Technology Workshop, Morhan Kaufmann Publishers, Inc., 1994, pp. 94-98; http://acl.ldc.upenn.edu/H/H94/H94-1016.pdf.
E. K. Ringger and J. F. Allen, Error Correction via a Post-Processor for Continuous Speech Recognition,In Proc. Of ICASSP-96, IEEE-96, 1996.
A. V. Aho, R. Sethi, and J. D. Ullman,Compilers, Addison-Wesley Publ. Co., 1986, 1988, Ch. 3, pp. 83-158.
G. Greffenstette and P. Tapanainen, What Is a Word, What Is a Sentence? Problems of Tokenization,3rdConference on Computational Lexicography and Text Research, COMPLEX '94, Budapest, Jul. 7-10, 1994.
M. D. Riley, Some Applications of Tree-Based Modelling To Speech and Language Indexing,Proceedings of the DARPA Speech and Natural Language Workshop, Morgan Kaufman, 1989, pp. 339-352.
D. D. Palmer, Satz—An Adaptive Sentence Segmentation System,M.S. Thesis and UC-Berkeley Technical Report UCB/CSD 94/846, University of California, Berkeley, Computer Science Division, 1994.
J. C. Reynar and A. Ratnaparkhi, A Maximum Entropy Approach to Identifying Sentence Boundaries,Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington D.C., 1997, pp. 16-19.
D. D. Palmer and M. A. Hearst, Adaptive Multilingual Sentence Boundary Disambiguation,Computational Linguistics, 23(2), 1997.
A. Mikheev, Tagging Sentence Boundaries,NACL 2000(Seattle) ACL Apr. 2000, pp. 264-271.
H. Schmid, Unsupervised Learning of Period Disambiguation for Tokenisation, Internal Report, IMS, University of Stuttgart, Apr. 2000.
D. Yarowsky, Homograph Disambiguation In Text-To-Speech Synthesis,Process In Speech Synthesis, J. van Santen, R. Sproat, J. Olive, and J. Hirschberg (eds.), Springer-Verlag, 1996, pp. 159-175.
D. Yarowsky, Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French,Proceedings of the 32ndAnnual Meeting of the Association for Computational Linguistics, Las Cruces, NM, 1994, pp. 88-95.
D. Yarowsky, A Comparison of Corpus-Based Techniques for Restoring Acents in Spanish and French Text,Proceedings, 2rdAnnual Workshop on Very Large Corpora, Kyoto, 1994, pp. 19-32.
R. Sproat, Multilingual Text Analysis for Text-To-Speech Synthesis,ECAI Workshop on Extended Finite-State Models of Language, Aug. 1996, 1996.
R. Sproat, A. W. Black, S. Chen, S. Kumar, M. Ostendorf, and C. Richards, Normalization of Non-Standard Words: WS '99 Final Report, Sep. 13, 1999, pp. 1-78, InComputer Speech and Language, 15(3), 2001, pp. 287-333.
Daelemans, Walter et al., TiMBL: Tilburg Memory-Based Learner Reference Guide, Nov. 26, 2003, pp. 1-51, Tilburg University and CNTS Research Group, University of Antwerp.

Affiliated with

Carrier Jill

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Carus Alwin B.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Cote William F.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Del La Femina Kathryn

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Dowd John

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Dictaphone Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Dorvil Richemond

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Godbold Douglas C

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wolf Greenfield & Sacks P.C.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for tokenization of text using classifier... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for tokenization of text using classifier..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for tokenization of text using classifier... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2668736

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure