Data processing: speech signal processing – linguistics – language – Linguistics
Patent
1997-09-15
2000-04-04
Isen, Forester W.
Data processing: speech signal processing, linguistics, language
Linguistics
704 9, 382229, G06F 1728, G06K 972
Patent
active
060472510
ABSTRACT:
The disclosed invention utilizes a dictionary-based approach to identify languages within different zones in a multi-lingual document. As a first step, a document image is segmented into various zones, regions and word tokens, using suitable geometric properties. Within each zone, the word tokens are compared to dictionaries associated with various candidate languages, and the language that exhibits the highest confidence factor is initially identified as the language of the zone. Subsequently, each zone is further split into regions. The language for each region is then identified, using the confidence factors for the words of that region. For any language determination having a low confidence value, the previously determined language of the zone is employed to assist the identification process.
REFERENCES:
patent: 3988715 (1976-10-01), Mullan et al.
patent: 4829580 (1989-05-01), Church
patent: 5062143 (1991-10-01), Schmitt
patent: 5182708 (1993-01-01), Ejiri
patent: 5371807 (1994-12-01), Register et al.
patent: 5418951 (1995-05-01), Damashek
patent: 5548507 (1996-08-01), Martino et al.
Anigbogu, J.C. et al, "Application of Hidden Markov Models to Multifont Text Recognition", 1.sup.st Int. Conference on Document Analysis and Recognition, Sep. 30-Oct. 2, 1991, Los Alamitos, CA, US, pp. 785-793.
Spitz, A. Lawrence et al, "Palace: A Multilingual Document Recognition System", Fuji Xerox Palo Alto Laboratory, Palo Alto, CA 94304, USA. pp. 16-36.
"QuickFrame Enhances Processing of Tough "Real World" Forms", Mitek QuickFrame, Mitek Systems, Inc, pp. 1-4.
Lee, Dar-Shyang et al, "Language Identification in Complex, Unoriented, and Degraded Document Images", Proc. Of IAPR Workshop on Document Analysis Systems, 1996, pp 76-98.
Spitz, A. Lawrence, "Script and Language Determination from Document Images", Proc. Of Symp. On Document Analysis and Information Retrieval, pp. 229-235.
Sibun, Penelope et al, "Language Identification: Examining the Issues", The Institute for the Learning Sciences, Northwestern University, Dept. Of Computer and Information Science, University of Pennsylvania, pp. 125-135.
Bokser Mindy R.
Choy Kenneth Chan
Kanungo Tapas
Pon Leonard K.
Yang Jun
Caere Corporation
Edouard Patrick N.
Isen Forester W.
LandOfFree
Automatic language identification system for multilingual optica does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic language identification system for multilingual optica, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic language identification system for multilingual optica will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-373929