Hybrid text segmentation using N-grams and lexical information

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

07917353

ABSTRACT:
A hybrid n-gram/lexical analysis tokenization system including a lexicon and a hybrid tokenizer operative to perform both N-gram tokenization of a text and lexical analysis tokenization of a text using the lexicon, and to construct either of an index and a classifier from the results of both of the N-gram tokenization and the lexical analysis tokenization, where the hybrid tokenizer is implemented in at least one of computer hardware and computer software and is embodied within a computer-readable medium.

REFERENCES:
patent: 6131082 (2000-10-01), Hargrave et al.
patent: 6173252 (2001-01-01), Qiu et al.
patent: 6311152 (2001-10-01), Bai et al.
patent: 7627562 (2009-12-01), Kacmarcik et al.
patent: 2005/0060150 (2005-03-01), Li et al.
Song et al. “Voting between Dictionary-based and Subword Tagging Models for Chinese Word Segmentation,” Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sidney, Jul. 2006, pp. 126-129.
Emerson. “Segmenting Chinese in Unicode,” Proc. of the 16th International Unicode Conference, Amsterdam, Mar. 2000, pp. 1-10.
Wu, Dakai, et al. (Improving Chinese Tokenization with Linguistic Filters on Statistical Lexical Acquisition), Article, pp. 180-181.
Banerjee, Satanjeev, et al. (The Design, Implementation, and Use of the Ngram Statistics Package), Article, 2003, pp. 370-381, Copyright Springer-Verlag Berlin Heidelberg.
Befferman, Doug, et al, “Text Segmentation Using Exponential Models”, pp. 35-46.
Cettolo, Mauro, et al., “Text Segmentation Criteria for Statical Machine Translation”.
Chen, Berlin, et al.“A Discriminative HMM/N-Gram-Based Retrieval Approach for Mandarin Spoken Documents” Jun. 2004, pp. 128-145, vol. 3 No. 2. Published by ACM, Inc. in New York, NY.
Cowie, Jim, et al, “Improving Robust Domain Independent Summarization” pp. 171-177.
Fung, Pascale, et al. “Statistical Augmentation of a Chinese Machine-Readable Dictionary” Jun. 7, 1994, pp. 1-17.
Hackett, Paul G. et al., “Comparison of Word-Based and Syllable-Based Retrieval for Tibetan”, Copyright by ACM, 2 pages.
Li, Yuk-Chi, et al. “Query Expansion using Phonetic Confusion for Chinese Spoken Document Retrieval” pp. 89-93.
McNamee, Paul, “Experiments in the Retrieval of Unsegmented Japanese Text at the NTCIR-2 Workshop”.
McNamee, Paul,“Why You Should Use N-grams for Multilingual Information Retrieval”., 1 page, UMBC EBIQUITY Research Group Website Article, Oct. 18, 2006.
Nagata, Masaaki, A Self-Organizing Japanese Word Segmenter using Heuristic Word Identification and Re-estimation, pp. 203-215.
Sproat, Richard,et al., “A Stochastic Finite-State Word-Segmentation Algorithm for Chinese”, Copy right 1996 Association for Computational Linguistics, PA, USA, vol. 22, No. 3.
Thanopoulos, Aristomenis,et al., “Text Tokenization for Knowledge-free Automatic Extraction of Lexical Similarities” TALN 2003, Batz-sur-Mer, Jun. 11-14, 2003.
Ratnaparkhi, Adwait, “Treebank Tokenization”, Article from www.cis.upenn.edu/˜treebank/tokenization.html.
Wiki, Collab, “TextNSP”, article from www.topicalizer.com/bwilmsmann/wiki/index.php/TextNSP, pp. 1-17.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Hybrid text segmentation using N-grams and lexical information does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Hybrid text segmentation using N-grams and lexical information, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hybrid text segmentation using N-grams and lexical information will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2672345

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.