Device for processing strings

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06526401

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a device for processing strings, which device predicts a probability of occurrence of a character following a given string, and, in particular, to a device for processing strings, which device predicts a probability of occurrence of each character of a string from a left context of the character in the string.
2. Description of the Related Art
PPM (Prediction by Partial Matching) is well used as a statistical language model in text compression. PPM* is a variant of PPM (see ‘Unbounded Length Context for PPM’, the Computer Journal, Vol. 40, No. 2/3, 1997, pages 67-75, written by J. G. Cleary and M. J. Teahan of the Department of Computer Science, University of Waikato, Hamilton, New Zealand, and ‘Japanese Word Segmentation by a PPM* Model’, NL report, 128-2 (1998. 11.5), pages 9-16, written by Hiroki Oda and Kenji Kita of the Faculty of Engineering, Tokushima University). The PPM* is characterized in that no upper limit is set on the number of order n (context length) of the model.
In PPM*, a string indexing structure through which it is possible to store past contexts compactly, and to refer to and to perform additions/deletions on them flexibly at high speed is needed. As such a string indexing structure, a trie (see the above-mentioned document ‘Unbounded Length Context for PPM’) or the like is used in the related art.
However, when the trie or the like is used as the string indexing structure, increase in the scale of context requires a large storage capacity.
Further, PPM* in the related art uses a relatively simple context-selection method, and performance of predicting an appearance probability of each character of an input string is not sufficient.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a string-processing device in which the size of a string indexing structure to be stored can be reduced even when the scale of context increases.
Another object of the present invention is to provide a string-processing device having high performance in predicting an appearance probability of each character of an input string.
In order to achieve the above-mentioned objects, a device for processing strings according to the present invention comprises:
a corpus-DB portion in which a corpus is stored;
an index portion in which a series of position numbers built for the corpus is stored;
a searching portion which searches for positions of occurrences of a given string in the corpus using the series of position numbers; and
a predicting portion which, using the result of search performed by the searching portion, predicts a probability of occurrence of a character following the given string.
In this arrangement, when a probability of occurrence of a character following the given string is predicted in an algorithm such as PPM*, a series of position numbers (such as a suffix array) built for a corpus is used instead of a trie or the like. Thereby, in comparison to the related art in which a trie or the like is used, it is possible to search positions of occurrences of the given string at high speed through binary search. As a result, it is possible to improve the performance of predicting a probability of occurrence of a subsequent character. Furthermore, it is possible to reduce the amount of storage required for a string indexing structure.
Other objects and further features of the present invention will become more apparent from the following detailed description when read in conjunction with the accompanying drawings.


REFERENCES:
patent: 5875108 (1999-02-01), Hoffberg et al.
A. Changcheng Huang; Devetsikiotis, M.; Lambadaris, I.; Kaye, A.R. discloes fast simulation for self-similar traffic in ATM networks Communications, 1995. ICC '95 Seattle, ‘Gateway to Globalization’, 1995 IEEE International Conference on , vol.: 1 1995.*
J. G. Cleary, et al. “Unbounded Length Contexts for PPM”, The Computer Journal, vol. 40, No. 2/3, 1997, p. 67-75.
U. Manber, et al., “Suffix arrays: A new method for on-line string searches”, SIAM Journal of Computing, vol. 22, No. 5, 1993, p. 1-16.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Device for processing strings does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Device for processing strings, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Device for processing strings will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3178470

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.