Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-07-17
2003-04-08
Metjahic, Safet (Department: 3171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06546401
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a method of retrieving data and a data retrieving apparatus.
2. Description of the Prior Art
A data retrieving apparatus for retrieving data from full text data is known. A data retrieving apparatus for retrieving data with index data is disclosed in Japanese patent application provisional publication No. 7-56943. In this data retrieving apparatus, special position marks are inserted Just before and Just after a predetermined character string and an index is generated.
SUMMARY OF THE INVENTION
The aim of the present invention is to provide a superior method of retrieving data and a superior data retrieving apparatus.
According to the present invention a first aspect of the present invention provides a method of retrieving first and second candidate data in full text data including no word separation data, comprising the steps of: (a) dividing the full text data into words and thereby generating word separation data; (b) generating and storing index data including the steps of: (c) extracting all character strings from the full text data, each character string including N characters, N being a natural number; and (d) attaching the word separation data and character position data of each of the character strings to each of the character strings to generate the index data; (e) inputting query data with segmentation indicative of leading and trailing ends of the query data; (f) detecting agreement in word retrieving, said (f) including steps of: (g) collating the query data with each of the character strings in the index data to detect character agreement; (h) collating the segmentation of the query data with the word separation data of each of the character strings to detect segmentation agreement; (i) outputting the character position data of one of character strings showing the character agreement and the segmentation agreement; and (j) detecting agreement in character string retrieving, said (j) including steps of: (k) collating the query data with each of the N characters in the index data; and (l) outputting the character position data of one of the character strings showing only the character agreement, wherein either of the step (f) or step (j) is effected in accordance with a selection command and the index data is commonly used in the steps (f) and (j).
Preferably, the step (a) includes a step of: generating the word separation data to have leading and trailing end data of each of the words and in step (h). The segmentation of the query data is compared with the leading and trailing end data of each character string, and in step (i). The position data of the first candidate data is outputted when the segmentation of the query data agrees with the leading and trailing end data of the one character string. Moreover, in this case, the step (a) further includes step of: checking whether a first character having a first order in one of the character strings has leading and trailing ends; attaching the leading end data to one of the character strings with respect to the first character when the first character has the leading end; attaching the trailing end data to one of the character strings with respect to the first character when the first character has the trailing end; checking whether a second character following the first character has a trailing end; attaching the trailing end data to the one of the character strings with respect to the second character when the second character has the trailing end.
Preferably, both the steps (f) and (j) are effected in accordance with the selection command.
Preferably the method further comprise steps of: dividing-the query data into query character strings, each query character string includes N query characters, the step (g) being executed for the query character strings to obtain collating results of the query character strings, respectively; estimating continuity of the character strings showing the character agreement with the query character strings in accordance with the position data of the character strings showing the character agreement, the step (h) being executed with respect to the word separation data Just before the first character and the word separation data Just after the last character of the character strings showing the character agreement and the continuity, wherein in step (i) the position data of the first candidate data is outputted when there is the continuity and the word separation data of the first and the last characters of the character strings agrees with the segmentation of the word separation data of the first and the last characters. In this case, the segmentation agreement is detected in either of first to fifth modes in response to a mode command, in the first mode, the segmentation agreement is established when the segmentation of the first and the last characters of the query data agrees with the word separation data Just before the first character and the word separation data Just after the last characters of the character string showing the character agreement; in the second mode, the segmentation agreement is established when the segmentation of the first and the last characters of the query data agrees with the word separation data just before the first character and the word separation data just after the last characters of the character string showing the character agreement and when the segmentation of only the first character of the query data agrees with the word separation data just before the first character of the character string showing the character agreement; in the third mode, the segmentation agreement is established when the segmentation of the first and the last characters of the query data agrees with the word separation data just before the first character and the word separation data just after the last characters of the character string showing the character agreement and when the segmentation of only the last character of the query data agrees with the word separation data just after the last character of the character string showing the character agreement; in the fourth mode, the segmentation agreement is established when the segmentation of only the first character of the query data agrees with the word separation data just before the first character of the character string showing the character agreement; and in the fifth mode, the segmentation agreement is established when the segmentation of only the last character of the query data agrees with the word separation data just before the first character of the character string showing the character agreement.
Preferably, the method further comprise the steps of: detecting a condition of each word in the full text data; and judging whether each word is a non-target word in retrieving in accordance with the condition. In the step (d), the word separation data is not attached to the one character string including the non-target word when one of the words is judged as a non-target word and the segmentation agreement is not effected when the word separation data is not attached to the one character string.
Preferably, the method further comprise the steps of: detecting a condition of each word in the full text data; and judging whether each word is a non-target word in retrieving in accordance with the condition, wherein in the step (d), the leading and trailing end data of the word separation data is not attached to the each character string when one of the words is judged to be a non-target word and the segmentation agreement is not detected when the word separation data is not attached to the one character string.
Preferably, the method further comprise the steps of: detecting a prefix and a suffix of each word in the full text data, wherein the leading end data is not generated as the word separation data when the previous word of one of the words is prefix and the trailing end data is not generated as the word separation data when the following word of one of the words is suffix. In this case, the method further comprise the steps of: d
Iizuka Yasuki
Kikuchi Chuichi
Tanabe Tomoko
Chen Te Yu
Metjahic Safet
Woo Louis
LandOfFree
Method of retrieving no word separation text data and a data... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of retrieving no word separation text data and a data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of retrieving no word separation text data and a data... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3025269