Automatic language identification using both N-gram and word inf

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Patent

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Automatic language identification using both N-gram and word inf Automatic language identification using both N-gram and word inf

: 1998-12-23
: 2000-12-26
: Isen, Forester W.
: Data processing: speech signal processing, linguistics, language
: Linguistics
: Natural language

: 704 10, G06F 1727
: Patent
: active
: 061673692
: ABSTRACT:
The predominant language of a sample text is automatically identified using probability data that include N-gram probability data for at least one language and word probability data for at least one language. The N-gram probability data of a language indicate, for each N-gram, the probability that it occurs if the language is predominant. Similarly, the word probability data of a language indicate, for each word, the probability that it occurs if the language is predominant. The probability data are used to automatically obtain sample probability data for at least two languages. The sample probability data include N-gram probability information for at least one language and word probability information for at least one language. The sample probability data are used to automatically obtain language identifying data identifying the language whose sample probability data indicate the highest probability. The N-grams can be trigrams, while the words can be short words of no more than five characters. Some languages can have both trigram and word probabilities, while some can have only trigram probabilities.

REFERENCES:
patent: 4610025 (1986-09-01), Blum et al.
patent: 4773009 (1988-09-01), Kucera et al.
patent: 4829580 (1989-05-01), Church
patent: 4930077 (1990-05-01), Fan
patent: 5062143 (1991-10-01), Schmitt
patent: 5182708 (1993-01-01), Ejiri
patent: 5251131 (1993-10-01), Masand et al.
patent: 5371807 (1994-12-01), Register et al.
patent: 5377280 (1994-12-01), Nakayama
patent: 5392419 (1995-02-01), Walton
patent: 5418951 (1995-05-01), Damashek
patent: 5548507 (1996-08-01), Martino et al.
patent: 5913185 (1999-06-01), Martino et al.
BEESLEY, KENNETH R. "Language Identifieer: A Computer Program for Automatic Natural-Language Identification of On-Line Text,"In the Proceedings of the 29.sup.th Annual Conference of the American Translators Association, 1988.
CAVNAR, WILLIAM B. ET AL. "N-Gram-Based Text Categorization," In Symposium on Document Analysis and Information Retrieval, 1994.
DUNNING, TED "Statistical Identification of Language, "CLR Tech Report (MCCS-94-273), 1994.
GREFENSTETTE, GREGORY "Comparing Two Language Identification Schemes," In Proceedings of 3.sup.rd International Conference on Statistical Analysis of Textual Data (JADT 1995), Rome, Italy; December, 1995, vol. II, pp. 263-268.
IntelliScope.RTM. Language Recognizer, Inso Corporation, 1997.
SIBUN, PENELOPE ET AL. "Language Determination: Natural Language Processing from Scanned Document Images, " Proceedings of the 4.sup.th Conference on Applied Natural Language Processing, Stuttgart, Germany; 1994, pp. 15-21.

Affiliated with

Schulze Bruno M.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Edouard Patrick N.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Isen Forester W.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Xerox Company

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Automatic language identification using both N-gram and word inf does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic language identification using both N-gram and word inf, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic language identification using both N-gram and word inf will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-1005664

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure