Automatic language identification system for multilingual optica

Data processing: speech signal processing – linguistics – language – Linguistics

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704 9, 382229, G06F 1728, G06K 972

Patent

active

060472510

ABSTRACT:
The disclosed invention utilizes a dictionary-based approach to identify languages within different zones in a multi-lingual document. As a first step, a document image is segmented into various zones, regions and word tokens, using suitable geometric properties. Within each zone, the word tokens are compared to dictionaries associated with various candidate languages, and the language that exhibits the highest confidence factor is initially identified as the language of the zone. Subsequently, each zone is further split into regions. The language for each region is then identified, using the confidence factors for the words of that region. For any language determination having a low confidence value, the previously determined language of the zone is employed to assist the identification process.

REFERENCES:
patent: 3988715 (1976-10-01), Mullan et al.
patent: 4829580 (1989-05-01), Church
patent: 5062143 (1991-10-01), Schmitt
patent: 5182708 (1993-01-01), Ejiri
patent: 5371807 (1994-12-01), Register et al.
patent: 5418951 (1995-05-01), Damashek
patent: 5548507 (1996-08-01), Martino et al.
Anigbogu, J.C. et al, "Application of Hidden Markov Models to Multifont Text Recognition", 1.sup.st Int. Conference on Document Analysis and Recognition, Sep. 30-Oct. 2, 1991, Los Alamitos, CA, US, pp. 785-793.
Spitz, A. Lawrence et al, "Palace: A Multilingual Document Recognition System", Fuji Xerox Palo Alto Laboratory, Palo Alto, CA 94304, USA. pp. 16-36.
"QuickFrame Enhances Processing of Tough "Real World" Forms", Mitek QuickFrame, Mitek Systems, Inc, pp. 1-4.
Lee, Dar-Shyang et al, "Language Identification in Complex, Unoriented, and Degraded Document Images", Proc. Of IAPR Workshop on Document Analysis Systems, 1996, pp 76-98.
Spitz, A. Lawrence, "Script and Language Determination from Document Images", Proc. Of Symp. On Document Analysis and Information Retrieval, pp. 229-235.
Sibun, Penelope et al, "Language Identification: Examining the Issues", The Institute for the Learning Sciences, Northwestern University, Dept. Of Computer and Information Science, University of Pennsylvania, pp. 125-135.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Automatic language identification system for multilingual optica does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Automatic language identification system for multilingual optica, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic language identification system for multilingual optica will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-373929

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.