Method to compress linguistic structures

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C704S009000

Reexamination Certificate

active

06535886

ABSTRACT:

FIELD OF THE INVENTION
The present invention is related to data structure compression. More specifically, the present invention relates to the compression of linguistic data structures for natural language translation systems.
BACKGROUND OF THE INVENTION
Natural language translation systems process and manage thousands of words, phrases, and sentences. To process and manage such vast amounts of data, linguistic data structures are used. These linguistic data structures not only store words, phrases, and sentences, but may also store associated qualifiers in order to process and manage the data more efficiently. Consequently, the large number and large size of linguistic data structures require large amounts of memory. Thus, a goal of natural language translation systems is to reduce the amount of memory for storing the linguistic data structures during language translation.
Conventional data compression techniques, however, are not well suited for compressing linguistic data structures for natural language translation systems. For example, a common data compression technique is the Lempel-Ziv (LZ) method. The LZ method exchanges recurring substrings automatically in straight text with references to the substrings according to a longest-match algorithm. Although the LZ method provides a comparatively high compression ratio, the LZ method is not well suited for natural language translation systems because natural language translation systems require fast and random access to any compressed text to perform language translation. In order for the LZ method to access the compressed text rapidly and randomly, the text must be entirely decompressed, which results in a performance penalty. Because natural language translation systems require fast and random memory access, it is not suitable to use the LZ method for compressing linguistic data structures.
Another conventional data compression technique is the dictionary method. The dictionary method references and stores redundant tokens in a separate dictionary. The tokens are chosen by human interaction. In this technique, the data may then be compressed by exchanging each instance of the token with a reference to the dictionary. The dictionary method requires human interaction in determining which substrings are to be referenced with tokens. For natural language translation systems, requiring human interaction is not feasible for compressing linguistic data structures because of the large amounts of data involved. Thus, what is required is a method to compress recurring segments within a data structure with an index to the segment while allowing fast and random access to the data structure.
SUMMARY OF THE INVENTION
A method and system for reducing the amount of memory used while allowing fast and random access to linguistic structures are described. In one embodiment, at least one segment within a data structure is identified. Each identified segment is counted to determine a number of occurrences of the identified segment within the data structure. Also, if the number occurrences is greater than one, the segment is saved in a recurring data structure and the segment is replaced in the data structure with an index corresponding to the segment in the recurring data structure.


REFERENCES:
patent: 4814746 (1989-03-01), Miller et al.
patent: 4905138 (1990-02-01), Bourne
patent: 4974191 (1990-11-01), Amirghodsi et al.
patent: 5010345 (1991-04-01), Nagy
patent: 5033087 (1991-07-01), Bahl et al.
patent: 5058137 (1991-10-01), Shah
patent: 5068789 (1991-11-01), van Vliembergen
patent: 5083268 (1992-01-01), Hemphill et al.
patent: 5088038 (1992-02-01), Tanaka et al.
patent: 5095432 (1992-03-01), Reed
patent: 5111398 (1992-05-01), Nunberg et al.
patent: 5155484 (1992-10-01), Chambers, IV
patent: 5323155 (1994-06-01), Iyer et al.
patent: 5418717 (1995-05-01), Su et al.
patent: 5426583 (1995-06-01), Uribe-Echebarria Diaz De Mendibil
patent: 5485373 (1996-01-01), Davis et al.
patent: 5510981 (1996-04-01), Berger et al.
patent: 5528491 (1996-06-01), Kuno et al.
patent: 5535120 (1996-07-01), Chong et al.
patent: 5561421 (1996-10-01), Smith et al.
patent: 5659737 (1997-08-01), Matsuda
patent: 5659765 (1997-08-01), Nii
patent: 5677835 (1997-10-01), Carbonell et al.
patent: 5680511 (1997-10-01), Baker et al.
patent: 5680601 (1997-10-01), Rust
patent: 5754847 (1998-05-01), Kaplan et al.
patent: 5768603 (1998-06-01), Brown et al.
patent: 5799268 (1998-08-01), Boguraev
patent: 5806021 (1998-09-01), Chen et al.
patent: 5864788 (1999-01-01), Kutsumi
patent: 5873056 (1999-02-01), Liddy et al.
patent: 5907821 (1999-05-01), Kaji et al.
patent: 5951623 (1999-09-01), Reynar et al.
patent: 5963894 (1999-10-01), Richardson et al.
patent: 5966686 (1999-10-01), Heidorn et al.
patent: 5977890 (1999-11-01), Rigoutsos et al.
patent: 5983169 (1999-11-01), Kozma
patent: 6067543 (2000-05-01), Burrows
patent: 6070140 (2000-05-01), Tran
patent: 6092065 (2000-07-01), Floratos et al.
patent: 6100825 (2000-08-01), Sedluk et al.
patent: 6121901 (2000-09-01), Welch et al.
patent: 6131082 (2000-10-01), Hargrave et al.
patent: 6161083 (2000-12-01), Franz et al.
patent: 6173441 (2001-01-01), Klein
patent: 6212500 (2001-04-01), Kohler
patent: 6230153 (2001-05-01), Howard et al.
patent: 6230168 (2001-05-01), Unger et al.
patent: 6240409 (2001-05-01), Aiken
patent: 6243669 (2001-06-01), Horiguchi et al.
patent: 6285978 (2001-09-01), Bernth et al.
patent: 6330530 (2001-12-01), Horiguchi et al.
patent: 0805403 (1997-11-01), None
PCT Search Report, Feb. 22, 2001, 5 pages.
PCT Search Report, International Application No. PCT/US00/28777, Oct. 17, 2000.
PCT Search Report, International Application No. PCT/US00/28830, Oct. 18, 2000.
PCT Search Report, International application no. PCT/US00/41256, filed Oct. 17, 2000.
S. Kurohashi, T. Nakamura, Y. Matsumoto, M. Nagao. Improvements of Japanese Morphological Analyzer JUMAN. In: “Proceedings of the International Workshop on Sharable Natural Language Resources”, p. 22-28, Nara, Japan, 1994.
Kenneth W. Church, “A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text”, in Proceedings of the Second Applied Natural Language Processing Conference, Austin, TX, 1988.
Edited by Karen Jensen, George E. Heidorn, Stephen D. Richardson, “Natural Language Processing: The PLNLP Approach”, Kluwer Academic Publishers, 1993, 22 pages.
Stuart M. Shieber, An Introduction to Unification-based Approaches to Grammar, CSLI, 1986, 23 pages.
M. Tomita, T. Mitamura, H. Musha, M. Kee, “The Generalized LR Parser/Compiler Version 8.1: User's Guide”, CMU-CMT-88-MEMO, Apr. 20, 1988, 44 pages.
M. Tomita, K. Knight, “Pseudo-Unification and Full-Unification”, CMU, 1987, 10 pages.
M. Ishii, K. Ohta, H. Saito, “An Efficient Parser Generator for Natural Language”, COLING 1994, 3 pages.
O. Furuse, H. Iida, “An Example-Based Method for Transfer-Driven Machine Translation”, Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation (TMI-92), 1992, p. 139-150.
P. Resnik, “Using Information Content to Evaluate Semantic Similarity in a Taxonomy”, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-95), 1995.
T.C. Bell, J.G. Cleary, I. H. Witten, “Text Compression”, Prentice Hall, 1990, 19 pages.
H. Maruyama, H. Watanabe, “Tree Cover Search Algorithm for Example-Based Translation”, in Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-92), 1992, p. 173-184.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method to compress linguistic structures does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method to compress linguistic structures, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method to compress linguistic structures will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3033507

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.