Method to reduce storage requirements when storing...

Coded data generation or conversion – Digital code to digital code converters – Unnecessary data suppression

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C341S050000

Reexamination Certificate

active

06731229

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates generally to data processing systems, and more particularly, to methods for reducing storage requirements in a database.
Many databases have been implemented that use a person's name as a key for record retrieval. To facilitate searching, these names are sometimes stored using an encoding scheme such as Soundex or Metaphone whereby fuzzy sound-like retrievals can be performed. The Soundex algorithm codes together surnames that sound similar but have different spellings. Soundex codes begin with the first letter of the surname, followed by a three-digit code that represents the first three remaining consonants. Zeros are added to names that do not have enough letters to be coded. In Soundex, consonants that sound alike have the same code. The coding guide is as follows:
1—B, P, F, V;
2—C, S, G, J, K, Q, X, Z;
3—D, T;
4—L;
5—M, N;
6—R.
The letters A, E, I, O, U, Y, H and W are not coded. Names with adjacent letters having the same equivalent number are coded as one letter with a single number. Surname prefixes are generally not used in the Soundex algorithm.
A Metaphone is an algorithm for encoding a word so that similar sounding words encode the same. It is similar to Soundex in purpose, but as it knows the basic rules of English pronunciation, it is more accurate. The higher accuracy requires more computational power, as well as more storage capacity. The algorithm reduces an input word to a one to eight or more character code using relatively simple phonetic rules for typical spoken English. Metaphone reduces the alphabet to sixteen consonant sounds: B, X, S, K, J, T, F, H, M, N, P, R, O, W, Y. Metaphone uses the following transformation rules: doubled letters, except “c”, drop the second letter; keep vowels only when they are the first letter.
Additionally, names can also be stored in an uppercase alphanumeric version to facilitate searching by partial character matches. When either of these methods are used to facilitate searching, the original mixed case name is also stored for display purposes as the “as originally entered” format. Obviously, storing a name in both an uppercase alphanumeric only version as well as the original mixed case true format may double the storage requirements.
This problem can be best described by an example. In a health provider's network, there typically exists a master person index (MPI) that is used to resolve a name to a single person, given a wide variety of partially complete and potentially different input fields. For example, assume a person's last name is “Mendez-Perez.” One operator may input the name as written (i.e., “Mendez-Perez”) while another operator may input the name as “Mendez Perez” or as “mendezperez.” To facilitate the expected outcome of searching of the database for this person, a retrieval key field may be created that is the uppercase alphabetic characters only, thus “MENDEZPEREZ” would be searched for in the column of the “squished” representation of the name. In this example, the “squished” representation is formed by converting all letters to uppercase and ignoring any character that is not an uppercase alphabetic character, thus a space, or hyphen would be discarded. Once the appropriate record has been found, the mixed case version of the name should be used for display at the operator's console.
One solution to the above problem is to simply fetch and apply the “squish” rule record by record to the name as originally input column of the database. This would be a very slow process since repetitive processing would need to be done for each search. Therefore, such a method is not a viable solution. Another approach that can be used is to store two columns, one already squished, and the other as originally input; thus doubling the storage space needed.
The well-known “zip” and “Hoffman” encoding techniques are optimized for, and function on a long series of subcharacter strings in long textual documents. What is needed is an algorithm that works better for encoding short common name character sequences where the data must exist in multiple forms for: (1) database searching and (2) display back to the operator.
One alternative is a simple bit mapping providing upper/lower case flagging information, but that alternative does not provide for reinsertion of the characters removed by a “squish” algorithm, i.e., the algorithm will only provide information if the character is translated to lower case, or copied as is.
SUMMARY OF THE INVENTION
This invention attempts to minimize the storage requirements required in keeping both forms of the name, one for machine searching/record retrieval consistency, and the other for human display. By using this invention, the space requirements per record can be greatly reduced thus allowing more records to be stored on the same media, and as a by-product of smaller databases, the information retrieval process can also be sped up.
This invention applies where the data needs to be stored in a compacted or “squished” format to facilitate a retrieval key, and the original input data must also be capable of being recreated. This invention applies where the general characteristics of the data to be stored are well known such that frequency of exception characters can be predicted in advance to assign the most efficient encoding scheme to the data. This invention applies to short strings rather than long lengthy texts.
The data to be encoded and stored in the database record is first analyzed to determine its characteristics. If the representation of a person's name is to be encoded in a bit string, then the data will be characterized by uppercase and lowercase alphabetic characters with a few additional characters such as an apostrophe or hyphen. The data analyzed can be a sample of the records to enter and store or the entire data set. The analysis can be performed by a computer software module, or can be done manually, or by a combination of computer processing of the input stream of data and manual analysis to determine trends and characteristics. An encoding scheme is then devised to encode the information input with a bit stream that represents the information. The information input is then compacted to convert the information input into a uniform format (e.g., all uppercase alphabetic characters or all lowercase alphabetic characters). The encoded and compacted information are then stored in a corresponding database record.
When a user wants to retrieve a particular record from the database, the information is entered by the user and the system compacts it, which in turn, is used to locate the record(s) in the database. The compacted information is used as a key to retrieve the record. The encoded representation of the information is retrieved with the record and is then used to decode the compacted information into the original information input which is displayed to the user. The original information input does not need to be stored in the database record as a result of this invention.


REFERENCES:
patent: 5045852 (1991-09-01), Mitchell et al.
patent: 5305433 (1994-04-01), Ohno
patent: 5546578 (1996-08-01), Takada
patent: 5590317 (1996-12-01), Iguchi
patent: 5600316 (1997-02-01), Moll
patent: 5778374 (1998-07-01), Dang et al.
patent: 5870087 (1999-02-01), Chau
patent: 5983239 (1999-11-01), Cannon
patent: 6664903 (2003-12-01), Kugai
patent: 07271565 (1995-10-01), None
“Method for Incorporating Data Conversion into Text Compression Scheme”,IBM Technical Disclosure Bulletin, vol. 36, No. 09B, Sep. 1993.
“Redundant MKH Files Design Among Multiple Disks for Concurrent Partial Match Retrieval”,Journal of Systems and Software, vol. 35, No. 3, pp. 199-207, Dec. 1996 (Abstract Only).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method to reduce storage requirements when storing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method to reduce storage requirements when storing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method to reduce storage requirements when storing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3187904

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.