Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-12-15
2002-07-16
Metjahic, Safet (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C709S241000
Reexamination Certificate
active
06421680
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to distributed database searches and in particular to reliably searching a heterogeneous distributed data base in which a diverse set of character mappings are employed. Still more particularly, the present invention relates to a data structure and case and character set insensitive search method and apparatus for reliably searching a distributed database which spans multiple system character encoding schemes for underlying data.
2. Description of the Related Art
Databases are employed by enterprises as key repositories of information. To a large extent, the value of the data stored within a database is determined by the reliability of accessing the stored information upon demand. In large databases with many persons entering the data, the integrity of the data may become compromised by differences in data entry techniques. Entries may be made, for instance, in various combinations of casings.
In distributed databases, particularly those which range across a variety of operating systems utilizing different character encoding schemes such the American National Standard Code for Information Interchange (ASCII) or Unicode on Windows NT or PC and Unix servers/workstations versus the Extended Binary-Coded Decimal Interchange Code (EBCDIC) on IBM mainframes, data entry and character encoding variances pose a problem for matching search keys. Traditional matching methods for searching are based on exact matches, such that possible matches are missed, particularly for operating systems or character sets which do not support case mapping or where a discrepancy in character encoding exists.
For instance, an operator searching for records within a database while on the telephone with a customer may enter “david kumhyr” in the name field(s) of the database search engine's user interface. The search application may search the database on a remote Unix-based system and find no match to the original name data if the data was originally entered as “David Kumhyr”. A similar result may occur even if there is no variance in the entered text between entries due to data not matching because the character encoding schemes differ. The hex encoded value for the first character (“D”) in the search key “David Kumhyr” is C4 in EBCDIC and 44 in ASCII (0044 in Unicode).
Fuzzy search algorithms applied to the problem described above tend to generate large quantities of non-matching data which must be further qualified to determine a match. A case and character set insensitive search method is therefore required. Often there is a need to create a search string which is a case insensitive equivalent to the base text. However, casing is a locale and language dependent operation, which is further dependent on the character encoding scheme employed, such as EBCDIC.
It would be desirable, therefore, to provide a case and character set insensitive search method and apparatus. It would further be advantageous if the search method could be transparently employed across systems utilizing incongruent character set encodings.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide an improved method, system and computer program product for distributed database searches.
It is another object of the present invention to provide an improved method, system and computer program product for reliably searching a heterogeneous distributed data base in which a diverse set of character mappings are employed.
It is yet another object of the present invention to provide a method, system and computer program product for is reliably case and character set insensitive searching of distributed databases which span multiple system character encodings for underlying data.
The foregoing objects are achieved as is now described. A search string for searching data distributed among various hosts employing different character encoding schemes or having different case-mapping capabilities is entered in a multi-field text string class. The multi-field text string class includes methods for transliterating characters within the original search string based on defined character equivalence tables. When the search string is received at a data host, a comparison is made of the operating system run on the originating data processing system, identified in a sourceVariant field of the multi-field text string class, and the operating system run on the data host, identified in a targetVariant field of the multi-field text string class. If necessary, an appropriate character equivalence table is selected and a variant of the search string is generated by transliteration. The search string variant is then passed to a search engine to search local data, either with or without the original search string, and matches identified are returned as matches for the original search string. Accurate search results are therefore produced despite the presence of different character encoding schemes, such as EBCDIC versus ASCII/Unicode, or operating systems which do not support case mapping.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.
REFERENCES:
patent: 5416903 (1995-05-01), Malcolm
patent: 5812964 (1998-09-01), Finger
patent: 6049838 (2000-04-01), Miller et al.
patent: 6233586 (2001-05-01), Chang et al.
Kumhyr David Bruce
Linton John Ferguson
Bracewell & Patterson L.L.P.
Chen Te Yu
Dawkins Marilyn S.
International Business Machines - Corporation
Metjahic Safet
LandOfFree
Method, system and computer program product for case and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method, system and computer program product for case and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, system and computer program product for case and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2834093