Data processing: presentation processing of document – operator i – Presentation processing of document – Layout
Reexamination Certificate
1999-09-01
2003-12-30
Feild, Joseph H. (Department: 2176)
Data processing: presentation processing of document, operator i
Presentation processing of document
Layout
C715S252000, C704S010000, C382S229000
Reexamination Certificate
active
06671856
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
Preferred embodiments provide a method, system, and program for determining boundaries in a string using a dictionary and, in particular, determining word boundaries.
2. Description of the Related Art
Most computer text editors, such as word processing programs, display words on a page such that characters within each word remain together. Thus, if an end of a line is reached, any word that would extend beyond the end of the line will be displayed or positioned as the first word in the next line. This same principle for positioning words on a line applies to printing text. A legal break position comes between a non-whitespace character and a whitespace character (but not the other way around—this leads to a “word” being a series of non-whitespace characters followed by a string of whitespace characters). Languages that do not use spaces may use punctuation marks to indicate a break point rather than the whitespace. In certain instances, some languages will not break on whitespaces (e.g., in French a space is placed between the last word in a sentence and a following question mark. In spite of this space, the break is still placed following the question mark to keep the word and question mark together).
For instance, Thai does not always separate words with spaces. However, when wrapping words of text on a display screen or printed paper, it is undesirable to split a word across two lines. One solution to ensure that line breaks in a string of unseparated words occur between words is to have the user of the text editor insert an invisible space between the words. Thus, when a Thai writer notices that certain compound words are broken in the middle of a word when wrapping to the next line, the Thai writer would manually insert an invisible space between the words to allow the lines to break in the proper places. This method can be tedious as it requires reliance on human observation and manual intervention to specify the places in the text where it is legal to break lines.
Another technique for determining legal breaks in text is a dictionary based boundary detection. Current dictionary based boundary detection techniques include in the dictionary common words that writers combine together without any break spaces, such as whitespaces. Current dictionary systems do not examine the document throughly for words that occur within the dictionary. When one of an instance of an unseparated word is found in the dictionary, a dictionary program or spell checker may propose a break to correct the problem. However, such methods are limited as the unseparated words that will be detected are limited to those encoded in the dictionary. Typically, current dictionary based boundary detection provides only a limited set of unseparated words to detect.
For the above reasons, there is a need in the art for an improved method, system, and program for determining boundaries within a string of words that does not have any word boundary indicators.
SUMMARY OF THE PREFERRED EMBODIMENTS
To overcome the limitations in the prior art described above, preferred embodiments disclose a method, system, and program for determining boundaries in a string of characters using a dictionary. A determination is made of all possible initial substrings of the string in the dictionary. One initial substring is selected such that all the characters following the initial substring can be divided into at least one substring that appears in the dictionary. The boundaries follow the initial substring and each of the at least one substring that includes the characters following the initial substring.
In further embodiments, the longest possible initial substring is selected.
In still further embodiments, selecting the initial substring comprises selecting a longest possible initial substring that was not previously selected until one initial substring is selected such that the characters following the selected initial substring can be divided into at least one substring in the dictionary.
In certain embodiments, the substrings comprise words and the boundaries comprise word boundaries.
Preferred embodiments provide an algorithm for determining word boundaries in a string of unseparated multiple words. Preferred embodiments use an algorithm that will consider different possible word combinations until all the characters of the string fall within word boundaries, if such an arrangement is possible.
REFERENCES:
patent: 3276130 (1966-10-01), Baskin et al.
patent: 3439341 (1969-04-01), Dolby et al.
patent: 3688275 (1972-08-01), Fredrickson et al.
patent: 4028677 (1977-06-01), Rosenbaum
patent: 4092729 (1978-05-01), Rosenbaum et al.
patent: 4181972 (1980-01-01), Casey
patent: 4456969 (1984-06-01), Herzik et al.
patent: 4574363 (1986-03-01), Carlgren et al.
patent: 4701851 (1987-10-01), Bass et al.
patent: 4777617 (1988-10-01), Frisch et al.
patent: 4873634 (1989-10-01), Frisch et al.
patent: 4974195 (1990-11-01), Amari et al.
patent: 5193147 (1993-03-01), Amari et al.
patent: 5295069 (1994-03-01), Hersey et al.
patent: 5490061 (1996-02-01), Tolin et al.
patent: 5560037 (1996-09-01), Kaplan
patent: 5590257 (1996-12-01), Forcier
patent: 5640551 (1997-06-01), Chu et al.
patent: 5655129 (1997-08-01), Ito
patent: 5721899 (1998-02-01), Namba
patent: 5774834 (1998-06-01), Visser
patent: 5778405 (1998-07-01), Ogawa
patent: 5806021 (1998-09-01), Chen et al.
patent: 6298321 (2001-10-01), Karlov et al.
patent: 0076909 (1988-04-01), None
M. Al-Suwaiyel and E. Horowitz, “Algorithms for Trie Compaction”, ACM Transactions on Database Systems, vol. 9, No. 2, Jun. 1984, pp. 243-263.
Bieneman Charles A
Feild Joseph H.
Konrad Raynes & Victor & Mann LLP
Victor David W.
LandOfFree
Method, system, and program for determining boundaries in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method, system, and program for determining boundaries in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, system, and program for determining boundaries in a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3120436