Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine
Patent
1997-12-11
2000-12-05
Ho, Ruay Lian
Data processing: speech signal processing, linguistics, language
Linguistics
Translation machine
704 9, 707536, G06F 1730
Patent
active
06157905&
ABSTRACT:
The present invention provides a facility for identifying the unknown language of text represented by a series of data values in accordance with a character set that associates character glyphs with particular data values. The facility first generates a characterization that characterizes the series of data values in terms of the occurrence of particular data values on the series of data values. For each of a plurality of languages, the facility then retrieves a model that models the language in terms of the statistical occurrence of particular data values in representative samples of text in that language. The facility then compares the retrieved models to the generated characterization of the series of data values, and identifies as the distinguished language the language whose model compares most favorably to the generated characterization of the series of data values.
REFERENCES:
patent: 5261009 (1993-11-01), Bokser
patent: 5418951 (1995-05-01), Demashek
patent: 5428707 (1995-06-01), Gould et al.
patent: 5477451 (1995-12-01), Brown et al.
patent: 5510981 (1996-04-01), Berger et al.
patent: 5592667 (1997-01-01), Bugajski
patent: 5594809 (1997-01-01), Kopec et al.
patent: 5608622 (1997-03-01), Church
patent: 5752227 (1998-05-01), Lyberb
patent: 5761687 (1998-06-01), Hon et al.
patent: 5768603 (1998-06-01), Brown et al.
patent: 5774588 (1998-06-01), Li
patent: 5805832 (1998-09-01), Brown et al.
patent: 5878390 (1999-03-01), Kawai et al.
patent: 5883986 (1999-03-01), Kopec et al.
patent: 5982933 (1999-11-01), Yoshii et al.
patent: 6070140 (2000-05-01), Tran
patent: 6073098 (2000-06-01), Buchsbaum et al.
Hayes, Brian, "Computer Recreations: A progress report on the fine art of turning literature into drivel." Scientific American, vol. 249, pp. 18-28, Nov., 1983.
Kikui et al., "Cross-Lingual Information Retrieval on the WWW," ECA196, 12.sup.th European Conference on Artificial Intelligence, MULSAIC96 Workshop, 1996, pp. 1-6.
Kikui, G., "Identifying the Coding System and Language of On-line Documents on the Internet," Sixteenth International Conference of Computational Linguistics (Coling), Aug. 1996, pp. 652-657.
Ho Ruay Lian
Microsoft Corporation
LandOfFree
Identifying language and character set of data representing text does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Identifying language and character set of data representing text, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Identifying language and character set of data representing text will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-970081