Fast text character set recognition

Data processing: speech signal processing – linguistics – language – Linguistics – Multilingual or national language support

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000

Reexamination Certificate

active

07865355

ABSTRACT:
Methods and apparatus, including computer program products, for identifying a language corresponding to a string of data include receiving a data string and dividing the data string into coded character sequences for each of a plurality of languages. A length of one or more coded character sequences varies among different languages for coded character sequences having a particular number of characters. The coded character sequences are analyzed to calculate, for each of the plurality of languages, a probability that the data string corresponds to language. The calculated probabilities are compared among the languages, and a language is identified as corresponding to the data string based on the comparison.

REFERENCES:
patent: 5548507 (1996-08-01), Martino et al.
patent: 6125362 (2000-09-01), Elworthy
patent: 6157905 (2000-12-01), Powell
patent: 6167369 (2000-12-01), Schulze
patent: 6539118 (2003-03-01), Murray et al.
patent: 7359851 (2008-04-01), Tong et al.
patent: 2003/0009324 (2003-01-01), Alpha
Suzuki et al, “A language and character set determination method based on N-gram statistics”, 2002, ACM Press, vol. 1 issue 3, p. 269-278.
Li et al, A Composite Approach to Language/Encoding Detection, 2001, Proc. of the 19thInternational Unicode Conference, pp. 1-14.
“Language Identification Tools”, [on-line], [retrieved Mar. 15, 2005] Retrieved from the Internet <URL: http://odur.let.rug.nl/˜vanoord/TestCat/competitors.html>.
“libTextCat”, [on-line], [retrieved Apr. 5, 2005] Retrieved from the Internet <URL: http://software.wise-guvs.nl/libtextcat/>.
“TextCat”, [on-line], [retrieved Mar. 15, 2005] Retrieved from the Internet <URL:http://odur.let.rug.nl/˜vannoord/TextCat/index.html>.
Cavnar et al., “N-Gram-Based Text Categorization”, in Porceedings of Third Annual Symposium on Document Analysis and INformaton Retrieval, Las Vegas, NV, UNLV Publications/Reprographics, pp. 161-175, Apr. 11-13, 1994.
Ha et al., “Extension of Zipf's Law to Words and Phrases”, Proceedings of 19thInternational Conference on Computational Linguistics (COLING'2002), pp. 315-320, 2002.
“rali—Applied Research in Computational Linguistics” [on-line], [retrieved Apr. 5, 2005] Retrieved from the Internet <URL: http://rali.iro.umontreal.ca/>.
“CA: Language Identifier”, Xerox Research Centre Europe, [on-line], [retrieved Apr. 1, 2005] Retrieved from the Internet <URL: http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser>.
Beesley, “Language Identifier: A Computer Program for Automatic Natural-Language Identification of On-line Text”,Languages at Crossroads: Proceedings of the 29thAnnual Conference of the American Translators Association, pp. 1-21, 1988.
Grefenstette, “Comparing Two Language Identification Schemes”,JADT, 3rdInternational Conference on Statistical Analysis of Textual Data, pp. 1-6, 1995.
“Rosette Language Identifier”, Basis Technology product information bulletin, [on-line], [retrieved Apr. 3, 2005] Retrieved from the Internet <URL:http://www.basistech.com/language%2Didentification/>.
“Basis Technology Adds Support for Middle Eastern Languages to Rosette Language Identifier”,Basis Technology press release, [on-line], [retrieved May 14, 2005] Retrieved from the Internet <URL: http://www.basistech.com/press%2Dreleases/2002/middle-eastern-support-043002.html>.
“Basis Technology Unveils Endocing and Language Identifier—Verity Selects Euclid to Identify Multilingual Data”, Basis Technology press release [on-line], [retrieved May 14, 2005] Retrieved from the Internet <URL: http://www.basistech.com/press%2dreleases/2000/euclid-release.html>2000.
Dunning, “Statistical Identification of Language”, Computing Research Laboratory, New Mexico State University, pp. 1-29, 1994.
“Lextek Language Identifier SDK”, Lextek International product bulletin [on-line], [retrieved Apr. 5, 2005] Retrieved from the Internet <URL: http://www.lextek.com/langid/>.
Ćavar, “Language ID Examples”, [on-line], [retrieved Apr. 5, 2005] Retrieved from the Internet <URL: http://jones.ling.indiana.edu/˜dcavar/tools/lid/index.html> last update Dec. 2003.
“Language Identification”, PetaMem product demo, [on-line], [retrieved Apr. 5, 2005] Retrieved form the Internet <URL: http:/
lp.petamem.com/langident.cgi> last update—Feb. 2005.
“LangWitch, the language identifier”, Morphologic product demo page [on-line], [retrieved Apr. 5, 2005], Retrieved from the Internet <URL:http://www.morphologic.hu/order/langwitch.asp?user=EN>.
“Languid: a statistical language identifier”, product demo [on-line], [retrieved Apr. 5, 2005] Retrieved from the Internet <URL: http://languid.cantbedone.org/> 2004.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Fast text character set recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Fast text character set recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fast text character set recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2685849

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.