Identifying language and character set of data representing text

Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704 9, 707536, G06F 1730

Patent

active

06157905&

ABSTRACT:
The present invention provides a facility for identifying the unknown language of text represented by a series of data values in accordance with a character set that associates character glyphs with particular data values. The facility first generates a characterization that characterizes the series of data values in terms of the occurrence of particular data values on the series of data values. For each of a plurality of languages, the facility then retrieves a model that models the language in terms of the statistical occurrence of particular data values in representative samples of text in that language. The facility then compares the retrieved models to the generated characterization of the series of data values, and identifies as the distinguished language the language whose model compares most favorably to the generated characterization of the series of data values.

REFERENCES:
patent: 5261009 (1993-11-01), Bokser
patent: 5418951 (1995-05-01), Demashek
patent: 5428707 (1995-06-01), Gould et al.
patent: 5477451 (1995-12-01), Brown et al.
patent: 5510981 (1996-04-01), Berger et al.
patent: 5592667 (1997-01-01), Bugajski
patent: 5594809 (1997-01-01), Kopec et al.
patent: 5608622 (1997-03-01), Church
patent: 5752227 (1998-05-01), Lyberb
patent: 5761687 (1998-06-01), Hon et al.
patent: 5768603 (1998-06-01), Brown et al.
patent: 5774588 (1998-06-01), Li
patent: 5805832 (1998-09-01), Brown et al.
patent: 5878390 (1999-03-01), Kawai et al.
patent: 5883986 (1999-03-01), Kopec et al.
patent: 5982933 (1999-11-01), Yoshii et al.
patent: 6070140 (2000-05-01), Tran
patent: 6073098 (2000-06-01), Buchsbaum et al.
Hayes, Brian, "Computer Recreations: A progress report on the fine art of turning literature into drivel." Scientific American, vol. 249, pp. 18-28, Nov., 1983.
Kikui et al., "Cross-Lingual Information Retrieval on the WWW," ECA196, 12.sup.th European Conference on Artificial Intelligence, MULSAIC96 Workshop, 1996, pp. 1-6.
Kikui, G., "Identifying the Coding System and Language of On-line Documents on the Internet," Sixteenth International Conference of Computational Linguistics (Coling), Aug. 1996, pp. 652-657.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Identifying language and character set of data representing text does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Identifying language and character set of data representing text, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Identifying language and character set of data representing text will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-970081

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.