Hybrid text segmentation using N-grams and lexical information

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Hybrid text segmentation using N-grams and lexical information Hybrid text segmentation using N-grams and lexical information

: 2011-03-29
: 2011-03-29
: McFadden, Susan (Department: 2626)
: Data processing: speech signal processing, linguistics, language
: Linguistics
: Natural language

: Reexamination Certificate
: active
: 07917353
: ABSTRACT:
A hybrid n-gram/lexical analysis tokenization system including a lexicon and a hybrid tokenizer operative to perform both N-gram tokenization of a text and lexical analysis tokenization of a text using the lexicon, and to construct either of an index and a classifier from the results of both of the N-gram tokenization and the lexical analysis tokenization, where the hybrid tokenizer is implemented in at least one of computer hardware and computer software and is embodied within a computer-readable medium.

REFERENCES:
patent: 6131082 (2000-10-01), Hargrave et al.
patent: 6173252 (2001-01-01), Qiu et al.
patent: 6311152 (2001-10-01), Bai et al.
patent: 7627562 (2009-12-01), Kacmarcik et al.
patent: 2005/0060150 (2005-03-01), Li et al.
Song et al. “Voting between Dictionary-based and Subword Tagging Models for Chinese Word Segmentation,” Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sidney, Jul. 2006, pp. 126-129.
Emerson. “Segmenting Chinese in Unicode,” Proc. of the 16th International Unicode Conference, Amsterdam, Mar. 2000, pp. 1-10.
Wu, Dakai, et al. (Improving Chinese Tokenization with Linguistic Filters on Statistical Lexical Acquisition), Article, pp. 180-181.
Banerjee, Satanjeev, et al. (The Design, Implementation, and Use of the Ngram Statistics Package), Article, 2003, pp. 370-381, Copyright Springer-Verlag Berlin Heidelberg.
Befferman, Doug, et al, “Text Segmentation Using Exponential Models”, pp. 35-46.
Cettolo, Mauro, et al., “Text Segmentation Criteria for Statical Machine Translation”.
Chen, Berlin, et al.“A Discriminative HMM/N-Gram-Based Retrieval Approach for Mandarin Spoken Documents” Jun. 2004, pp. 128-145, vol. 3 No. 2. Published by ACM, Inc. in New York, NY.
Cowie, Jim, et al, “Improving Robust Domain Independent Summarization” pp. 171-177.
Fung, Pascale, et al. “Statistical Augmentation of a Chinese Machine-Readable Dictionary” Jun. 7, 1994, pp. 1-17.
Hackett, Paul G. et al., “Comparison of Word-Based and Syllable-Based Retrieval for Tibetan”, Copyright by ACM, 2 pages.
Li, Yuk-Chi, et al. “Query Expansion using Phonetic Confusion for Chinese Spoken Document Retrieval” pp. 89-93.
McNamee, Paul, “Experiments in the Retrieval of Unsegmented Japanese Text at the NTCIR-2 Workshop”.
McNamee, Paul,“Why You Should Use N-grams for Multilingual Information Retrieval”., 1 page, UMBC EBIQUITY Research Group Website Article, Oct. 18, 2006.
Nagata, Masaaki, A Self-Organizing Japanese Word Segmenter using Heuristic Word Identification and Re-estimation, pp. 203-215.
Sproat, Richard,et al., “A Stochastic Finite-State Word-Segmentation Algorithm for Chinese”, Copy right 1996 Association for Computational Linguistics, PA, USA, vol. 22, No. 3.
Thanopoulos, Aristomenis,et al., “Text Tokenization for Knowledge-free Automatic Extraction of Lexical Similarities” TALN 2003, Batz-sur-Mer, Jun. 11-14, 2003.
Ratnaparkhi, Adwait, “Treebank Tokenization”, Article from www.cis.upenn.edu/˜treebank/tokenization.html.
Wiki, Collab, “TextNSP”, article from www.topicalizer.com/bwilmsmann/wiki/index.php/TextNSP, pp. 1-17.

Affiliated with

Dayan Yigal Shai

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Magdalen Josemina Marcella

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Mazel Victoria

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Doubet Marcia L.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

International Business Machines - Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

McFadden Susan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Hybrid text segmentation using N-grams and lexical information does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Hybrid text segmentation using N-grams and lexical information, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hybrid text segmentation using N-grams and lexical information will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2672345

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure