Recording medium and character string collating apparatus...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C704S007000

Reexamination Certificate

active

06260051

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to a recording medium and a character string collating apparatus used for the retrieval of a character string written in a text in an information retrieval processing field, and more particularly to a recording medium, in which character data of a full text are recorded to be possible to be read out, and a character string collating apparatus in which a retrieval character string is collated with a registration character text to detect a particular character string agreeing with the retrieval character string from the registration character string by using the character data of the full text recorded in the recording medium.
2. Description of the Related Art
PREVIOUSLY PROPOSED ART
FIG. 1A
shows an example of a registration character string extracted from a text,
FIG. 1B
shows a table of two-character chains extracted from the registration character string, FIG.
1
C shows a table of two-character chain types in which at least one occurrence frequency set corresponds to each two-character chain type, and
FIG. 1D
shows an example of a retrieval character string input by a user to retrieve a particular character string agreeing with the retrieval character string from the registration character string of the text.
As shown in
FIG. 1A
, when a user intends to retrieve a particular character string agreeing with a retrieval character string from a text according to a conventional character string collating method, a registration character string “AB---CDæEF---GH” extracted from the text is decomposed into a plurality of two-character chains “AB”,-------, “CD”, “Dæ”, “æE”, “EF”,-------, and “GH”. Here, each two-character chain is composed of a fore character and a rear character arranged in the order of arranging the characters in the registration character string, and a letter “æ” denotes a special character inserted into a string of characters to divide the string of characters into a first divided string of characters expressing a first meaning and a second divided string of characters expressing a second meaning. The special character frequently occurs in a text. Also, the special character is not limited to a character. For example, a space frequently occurring in a text written in Hangul language can be defined as one type of special character, and a space frequently occurring in a text written in English to divide words can be also defined as one type of special character.
An occurrence frequency of each character included in the two-character chains is counted. The occurrence frequency of one character C
1
placed in a prescribed position of the registration character string is defined as the number of characters of the same type as that of the character C
1
existing in a character area between the starting position of the registration character string and the prescribed position of the registration character string. As shown in
FIG. 1B
, an occurrence frequency of the fore character “C” of the first two-character chain “CD” is indicated by N
1
, an occurrence frequency of the rear character “D” of the first two-character chain “CD” is indicated by N
2
, and the occurrence frequencies N
1
and N
2
for the first two-character chain are indicated by an occurrence frequency set (N
1
, N
2
). Also, occurrence frequencies of the two characters “D” and “æ” of the second two-character chain “Dæ” are indicated by N
2
and N
3
, occurrence frequencies of the two characters “æ” and “E” of the third two-character chain “æE” are indicated by N
3
and N
4
, and occurrence frequencies of the two characters “E” and “F” of the fourth two-character chain “EF” are indicated by N
4
and N
5
. The occurrence frequency of the rear character of a fore two-character chain agrees with that of the fore character of a rear two-character chain following the fore two-character chain in the registration character string.
However, in practical use, because a number of two-character chains respectively having the same type of fore character and the same type of rear character exist in the registration character string, when a plurality of two-character chains respectively having the same type of fore character and the same type of rear character is called a two-character chain type, a plurality of occurrence frequency sets correspond to each two-character chain type. For example, as shown in
FIG. 1C
, when occurrence frequencies of the fore character “C” of the two-character chain “CD” occurring many times in the registration character string are N
1
, Na,--, and Nx and occurrence frequencies of the rear character “D” of the two-character chain “CD” are N
2
, Nb,--, and Ny, a plurality of occurrence frequency sets (N
1
,N
2
), (Na,Nb),-- and (Nx,Ny) correspond to the two-character chain type “CD” in a table of two-character chain types.
Also, when a retrieval character string “CDæEF” shown in
FIG. 1D
is input by a user to retrieve a particular character string agreeing with the retrieval character string from the registration character string of the text, the retrieval character string is decomposed into a plurality of retrieval two-character chains “CD”, “Dæ”, “æE” and “EF”.
In the conventional character string collating method, a plurality of particular two-character chain types of the registration character string agreeing with the retrieval two-character chains of the retrieval character string are detected in the order of arranging the retrieval two-character chains in the retrieval character string, and each particular two-character chain type of the registration character string is searched for one occurrence frequency set of the particular two-character chain type on condition that the occurrence frequency of the fore character of the particular two-character chain type Tc
1
agrees with that of the rear character of another particular two-character chain type Tc
2
detected just before the particular two-character chain type Tc
1
. In cases where a series of occurrence frequency sets of the particular two-character chain types agreeing with a series of retrieval two-character chains of the retrieval character string is detected on condition that the occurrence frequency of the fore character of each particular two-character chain type Tc
1
agrees with that of the rear character of another particular two-character chain type Tc
2
detected just before the particular two-character chain type Tc
1
, a particular character string corresponding to the series of occurrence frequency sets of the particular two-character chain types of the registration character string is retrieved from the registration character string of the text.
For example, it is judged whether or not each occurrence frequency of the fore character “D” of the second two-character chain type “Dæ” agreeing with the second retrieval two-character chain “Dæ” agrees with the occurrence frequency N
2
of the rear character “D” of the first two-character chain type “CD” agreeing with the first retrieval two-character chain “CD”. When the occurrence frequency N
2
of the fore character “D” of the second two-character chain type “Dæ” is detected, it is judged whether or not each occurrence frequency of the fore character “æ” of the third two-character chain type “æE” agreeing with the third retrieval two-character chain “æE” agrees with the occurrence frequency N
3
of the rear character “æ” of the second two-character chain type “Dæ”. When the occurrence frequency N
3
of the fore character “æ” of the third two-character chain type “æE” is detected, it is judged whether or not each occurrence frequency of the fore character “E” of the fourth two-character chain type “EF” agreeing with the fourth retrieval two-character chain “EF” agrees with the occurrence frequency N
4
of the rear character “E” of the third two-character chain type “æE”. When the occurrence frequency N
4
of the fore character “æE” of the fourth two-character chain type “EF” is detected, a particular character string “CDæEF” corresponding to the two-character chain

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Recording medium and character string collating apparatus... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Recording medium and character string collating apparatus..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Recording medium and character string collating apparatus... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2553225

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.