Method and apparatus for record addressing in partitioned files

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06557014

ABSTRACT:

BACKGROUND OF THE INVENTION
Record identifiers are used extensively within a database system to represent various kinds of links or references between records. For example, a record describing a banking transaction may contain a reference to the record of the customer who initiated the transaction, the record of the branch that handled the transaction, or other records. The references are typically implemented by storing the referenced record's identifier in the referencing record.
In practice, many different types of identifiers are used to implement references. While some identifiers offer flexibility, others offer efficiency. For example, “logical identifiers” do not provide information about the actual physical location of a record in memory. Thus, logical identifiers require an additional and costly lookup to locate a record. Although inefficient, logical identifiers offer significant flexibility, since relocating records may be provided simply by updating the structure (e.g., B-tree or hash table) that maps identifiers to actual physical locations.
In contrast, identifiers that provide physical address information, such as actual disk addresses, are extremely efficient but lack flexibility. For example, physical address identifiers do not facilitate the movement of records and therefore do not accommodate file growth.
A partitioned or distributed environment such as a database or file adds a level of complexity that presents additional efficiency and flexibility challenges in assigning and looking up record identifiers. For example, providing an efficient lookup in a partitioned file is particularly challenging since an extra level of identifier translation may be required to determine the partition containing a record. Furthermore, there are greater flexibility requirements since there are more ways to reorganize a partitioned file in response to adding, dropping, or merging partitions.
Conventional methods, such as hashing and range methods, perform record addressing and data access in partitioned files. Generally, with hashing, an index into a hash table is created by applying a formula to each data record to produce a numeric identifier, generally referred to as a hash key or index key. A hash key is used to index into a table to find all records associated with the key. However, an additional method must be used to locate the desired record among the records that possess the same hash key. For example, a customer number could be used to identify the desired records among a set of records with a specified hash value.
Although highly flexible in accessing data, hashing is inefficient if there is a change in the number of file partitions. More specifically, to re-index the file it may be necessary to perform the costly and inefficient process of re-computing the hash value of each data record. Furthermore, re-computing each value is likely to move data records between partitions unnecessarily. When the number of partitions changes, it is costly to remap records to partitions while maintaining balanced partitioning where the records are distributed evenly between the partitions. Accordingly, hashing provides an inefficient and costly method for indexing and accessing data if the number of partitions of the file may change.
If a file is partitioned and the number of partitions changes, the range method of indexing and accessing data is similarly inefficient. Generally, with ranges, marks are set that assign contiguous values among partitions of a file. For example, if there are two partitions, all values having a first character within the alphanumeric range of “a-m” may be assigned to partition
1
, and all values having a first character within the alphanumeric range of “n-z” may be assigned to partition
2
. If a third partition is added, and there is a requirement to assign equal ranges among the partitions, each partition's range will change. For example, partition
1
will acquire values of “a-i”, partition
2
will acquire values of“j-r”, and partition
3
will acquire values of “s-y”. Since the range changes affect each partition, it is likely that at least one half of the data records will move between partitions. Therefore, like hashing, the range method for indexing and accessing data is not efficient if files are partitioned and the number of partitions changes.
Thus, there is a need for a method of assigning identifiers to records and of subsequently locating a record given its identifier that provides a high level of flexibility and efficiency where the identifiers are applied to collections of uniformly structured records partitioned across multiple disks.
SUMMARY OF THE INVENTION
The described embodiment of the present invention introduces the creation and use of two record-addressing functions. One function “computeRID” assigns identifiers to records, while the other function “lookupRID” performs the related task of locating a record given its identifier. Although the functions are highly flexible and efficient and may be applied to all types of database structures, the functions are preferably applied to environments having file organizations in which the files are collections of uniformly structured records that are partitioned across multiple disks. Example file organizations include a one-dimensional array of records such as Relative files by Tandem Computers, a division of Compaq, of Cupertino, Calif., a key sequence such as a B-tree, and a hash file table. The functions may also be applied to various types of partitioning methods. For example, the functions may be applied when records are assigned to a partition using either a round-robin method or hashing.
The two record addressing functions of the present invention possess the properties of efficient record lookup, easy map maintenance, and flexible record addressing. The functions provide efficient lookup with a straightforward approach that does not include physical identifiers. The complexity of the functions is on the order of n operations, where n is the number of partitions. Therefore the functions iterate once for each partition. Furthermore, the mapping is independent of the number of records actually in the file.
The addressing functions of the present invention promote easy map maintenance since each function may be applied to any type of file. There is no file-specific state to maintain such as a search structure that maps ranges of identifiers to partitions. Furthermore, a file may evolve over time, where the number of partitions grows and shrinks. The mapping changes automatically as the number of partitions grows and shrinks, thereby relieving a user of the burden of specifying how the mapping changes (e.g., range).
The addressing functions promote flexible record addressing, particularly when increasing or deleting the number of partitions in a file. For example, when a table with n partitions is grown to n+1 partitions, only every nth record from the original n partitions is remapped to the new partition. No other records need to be remapped from one of the original n partitions to another.
The properties described above tend to produce balanced partitions, or files in which records are spread uniformly over partitions. Furthermore, the addition or deletion of a partition is efficient, since only 1
of the records must be moved to a new partition. Also, the user is not burdened with the task of determining which identifier range to map to a new partition to achieve balanced partitions, since the function performs this task automatically.
In accordance with the purpose of the invention, as embodied and broadly described herein, the invention relates to a method for assigning identifiers to records, by a data processing system having a memory, comprising the steps of: receiving a partition number, an offset and a total number of partitions; repeatedly looping a number of times equal to the total number of partitions—1; the loop including: if the partition number is equal to a current loop value, determining a new partition number and performing a first offset compre

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for record addressing in partitioned files does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for record addressing in partitioned files, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for record addressing in partitioned files will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3098936

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.