Compound word breaker and spell checker

Data processing: speech signal processing – linguistics – language – Linguistics – Dictionary building – modification – or prioritization

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S001000, C704S009000, C715S254000, C715S255000, C715S258000, C715S259000

Reexamination Certificate

active

07447627

ABSTRACT:
A method of determining the component words of a compound word is disclosed. The method identifies the component words, by comparing the word with a list of words found in a lexicon. If the word is not found in the lexicon the method proceeds to analyze the word on a character-by-character basis. After each character the method identifies any potential matches to the selected characters in the lexicon. If a match is found, it is added to a hypothesis trace in a lattice. Next, the method checks to see whether the remaining characters form a valid entry in the lexicon, and whether the entry is allowed to be a final segment.

REFERENCES:
patent: 4384329 (1983-05-01), Rosenbaum et al.
patent: 4672571 (1987-06-01), Bass et al.
patent: 4688192 (1987-08-01), Yoshimura et al.
patent: 4701851 (1987-10-01), Bass et al.
patent: 4703425 (1987-10-01), Muraki
patent: 4724523 (1988-02-01), Kucera
patent: 4736296 (1988-04-01), Katayama et al.
patent: 4771385 (1988-09-01), Egami et al.
patent: 4868750 (1989-09-01), Kucera et al.
patent: 4887212 (1989-12-01), Zamora et al.
patent: 4969097 (1990-11-01), Levin
patent: 4991135 (1991-02-01), Yoshimura et al.
patent: 5056021 (1991-10-01), Ausborn
patent: 5289376 (1994-02-01), Yokogawa
patent: 5611076 (1997-03-01), Durflinger et al.
patent: 5642522 (1997-06-01), Zaenen et al.
patent: 5708829 (1998-01-01), Kadashevich et al.
patent: 5715468 (1998-02-01), Budzinski
patent: 5761688 (1998-06-01), Morishita
patent: 5799268 (1998-08-01), Boguraev
patent: 5867812 (1999-02-01), Sassano
patent: 5995922 (1999-11-01), Penteroudakis et al.
patent: 5995992 (1999-11-01), Eckard
patent: 6021409 (2000-02-01), Burrows
patent: 6035268 (2000-03-01), Carus et al.
patent: 6081774 (2000-06-01), De Hita et al.
patent: 6138087 (2000-10-01), Budzinski
patent: 6233553 (2001-05-01), Contolini et al.
patent: 6278967 (2001-08-01), Akers et al.
patent: 6278968 (2001-08-01), Franz et al.
patent: 6298321 (2001-10-01), Karlov et al.
patent: 6393389 (2002-05-01), Chanod et al.
patent: 6424983 (2002-07-01), Schabes et al.
patent: 6675169 (2004-01-01), Bennett et al.
patent: 6735559 (2004-05-01), Takazawa
patent: 6760695 (2004-07-01), Kuno et al.
patent: 6792418 (2004-09-01), Binnig et al.
patent: 6965858 (2005-11-01), Kempe
patent: 2003/0187649 (2003-10-01), Logan et al.
patent: 2003/0204392 (2003-10-01), Finnigan
patent: 2005/0091031 (2005-04-01), Powell
patent: 2005/0091033 (2005-04-01), Valdes
patent: WO 99/50829 (1999-10-01), None
Branco et al. , A. , “Tokenization of Portuguese: resolving the hard cases”, Technical Report TR-2003-4, Department of Informatics, University of Lisbon, Mar. 2003.
Rayner et al., M., “Adapting the Core Language Engine to French and Spanish”, Technical Report CRC-061, Proceedings NLP-IA, Moncton, New Brunswick, May 10, 1995.
Bleam, T., “(Non-) Parallels between Double Object Constructions and Spanish IO Clitic-doubling”, Paper presented at LSRL 31, Chicago, Apr. 22, 2001.
Sánchez León, F., “Spanish tagset for the CRATER project”, CRATER Internal Document, Mar. 7, 1994.
Habash, N., “Matador: A Large-Scale Spanish-English GHMT System”, http://www.amtaweb.org/summit/MTSummit/FinalPapers/84-Habash-final.pdf, at least by Feb. 4, 2004.
Gõni et al., J., “A framework for lexical representation”, AI95: Fifteenth International Conference. Language Engineering, Montpellier, Francia, pp. 243-252, Jun. 1995.
Marimon, M., “Integrating Shallow Linguistic Processing into a Unification-based Spanish Grammar”, presented at COLING 2002: The 17th International Conference on Computational Linguistics, Aug. 29, 2002.
Palomar et al., M., “An Algorithm for Anaphora Resolution in Spanish Texts”, Computational Linguistics, vol. 27, No. 4, pp. 546-567, Mar. 2001.
Bozsahin et al., H., “A Categorial Framework for Composition in Multiple Linguistic Domains”, In Proceedings of the 4th International Conference on Cognitive Science of NLP, Dublin, CSNLP'95, Jul. 1995.
Giguet et al., E., “From Part of Speech Tagging to Memory-based Deep Syntactic Analysis”, In Proceedings of the International Workshop on Parsing Technologies, (IWPT'97 ), pp. 77-88, MIT, Boston, Massachusetts, USA, Sep. 17-20, 1997.
Verity Internationalization “Enabling E-business in Multiple Languages” Jun. 2001.
Zweigenbaum et al., P., “Towards a Unified Medical Lexicon for French”, In Studies in Health Technology and Informatics. Medical Informatics Europe; The New Navigators: from Professionals in Patients, vol. 95, pp. 415-420, 2003.
Hedlund et al., T., “Utaclir @ CLEF 2001—Effects of Compound Splitting and N-Gram Techniques”, CLEF 2001 LNCS 2406, pp. 118-136, 2002.
Koehn et al., P., “Empirical Methods for Compound Splitting”, EAC: 2003, 11thConference of the European Chapter of the Association for Computational Linguistics 2003, pp. 187-194, http://www.isi.edu/˜koehn/publications/compound2003.pdf.
Crabbe et al., B., “Lexical Classes for Structuring the Lexicon of a Tag”,Proceedings of the European Summer School on Logic, Language and Information 2003, http://www.loria,fr/˜crabbe/doc/lexclasses.pdf.
Chang et al., E., “Induction of Classification from Lexicon Expansion: Assigning Domain Tags to WordNet Entries”,Proceedings of the First International WordNet Conference, pp. 155-164, Jan. 2002.
Bodik et al., P., “Formation of a Common Spatial Lexicon and its Change in a Community of Moving Agents”,Frontiers in AI:Proceedings of Scandinavian Conference on Artificial Intelligence—SCAI 2003, http://www.cs.berkeley.edu/˜bodikp/publications/scai03.pdf.
Bleam, T., “(Non-) Parallels between Double Object Constructions and Spanish IO Clitic-doubling”, Paper presented at LSRL 31, Chicago, Apr. 22, 2001.
Bodik, P. and M. Takac, “Formation of a Common Spatial Lexicon and its Change in a Community of Moving Agents”,Frontiers in AI:Proceedings of Scandinavian Conference on Artificial Intelligence—SCAI 2003, http://www.cs.berkeley.edu/˜bodikp/publications/scai03.pdf.
Bozsahin et al., H., “A Categorial Framework for Composition in Multiple Linguistic Domains”, In Proceedings of the 4th International Conference on Cognitive Science of NLP, Dublin, CSNLP'95, Jul. 1995.
Branco et al., A., “Tokenization of Portuguese: resolving the hard cases”, Technical Report TR-2003-4, Department of Informatics, University of Lisbon, Mar. 2003.
Chang, Echa; Huang, Chu-Ren; et al. “Induction of Classification from Lexicon Expansion”.Proceedings of the First International WordNet Conference Jan. 21-25, 2002. Central Institute of Indian Languages, 2002. pp. 155-164.
Crabbe, Benoit, “Lexical Classes for Structuring the Lexicon of a Tag”,Proceedings of the European Summer School on Logic, Language and Information 2003,http://www.loria,fr/˜crabbe/doc/lexclasses.pdf.
Giguet et al., E., “From Part of Speech Tagging to Memory-based Deep Syntatic Analysis”, In Proceedings of the International Workshop on Parsing Technologies, (IWPT'97), pp. 77-88, MIT, Boston, Massachusetts, USA, Sep. 17-20, 1997.
Gõni et al., J., “A framework for lexical representation”, AI95: Fifteenth International Conference. Language Engineering, Montpellier, Francia, pp. 243-252, Jun. 1995.
Habash, N., “Matador: A Large-Scale Spanish-English GHMT System”, http://www.amtaweb.org/summit/MTsummit/FinalPapers/84-Habash-final.pdf, at least by Feb. 4, 2004.
Hedlund, Turid et al. “Effects of Compound Splitting and N-Gram Techniques”. 2002. In Proceedings of hte Second Workshop of hte Cross-Language Evaluation Forum,CLEF, 2001. pp. 118-136.
Koehn, Philipp and Kevin Knight. “Empirical Methods for Compound Splitting” EACL 2003,11thConference of the European Chapter of the Association for Computational Linguistics 2003, pp. 187-194. http://www.is

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Compound word breaker and spell checker does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Compound word breaker and spell checker, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Compound word breaker and spell checker will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4023047

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.