Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-11-13
2003-03-11
Robinson, Greta (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06532476
ABSTRACT:
TECHNICAL FIELD OF THE INVENTION
The present invention relates to the field of computer database software used for the storage and retrieval of information, and more particularly to an adaptive multi-dimensional database capable of storing and retrieving information of any type and format to and from both persistent and non-persistent storage.
BACKGROUND OF THE INVENTION
For nearly as long as computers have been used for the calculation of results, they have been used for the storage and retrieval of information. This task is one for which computers are well suited; the structure of the computing hardware itself (specifically, a processor controlling persistent and non-persistent storage) provides an excellent platform for the storage and retrieval of information.
Current database technologies are typically characterized by one or the other of two predominant data storage methodologies. The first of these methodologies is known generally as “relational” storage. While there are many characteristics of relational databases, perhaps the most significant is the requirement that every piece of information stored must be of a predetermined length. At the time the file is constructed, the length of each data field to be stored per record is determined, and all records added from that point forward must adhere to those restrictions on a field-by-field basis. While this methodology is certainly pragmatic, it provides several opportunities for improvement. First, if a field is defined to be x in length, then exactly x characters must be stored there. If information exceeding xcharacters must be stored, that information must be divided among multiple fields, disassembled at the time of storage, and reassembled at the time of retrieval. Such manipulation provides no practical benefit, other than to overcome an inherent weakness in the technology. On the other hand, if less than x characters are to be stored, storage space is wasted as the information is padded out with a predefined neutral character in order to fit the x character minimum for the field.
Another characteristic of a relational database is that it is inherently two-dimensional. A relational database is essentially a table organized into columns and rows, which provides a single data element at the intersection of each column and row. While this is an easily understood storage model, it is highly restrictive. If multiple values are required at each intersection, the database designer has two options: either 1) add new columns, or 2) add new rows for each of the multiple values. Neither option is optimal. If a new column is added, each row must then also contain that new column, regardless of whether or not multiple values exist for that row, since the size of the record is fixed and must be known prior to allocating the record. If, on the other hand, a new row is added for each multiple value, each row must then store duplicate information to maintain the relationships. In either case, storage is unnecessarily allocated, resulting in inefficient storage use.
To illustrate this, consider a relational database file containing parent and child names. For each parent, the file supports the storage of one child, such as the following:
Parent
Child
Joe Smith
Sally Smith
Bob Thomas
Jim Thomas
The file structure presents a problem if a parent has more than one child. Using the relational model, the database designer has one of two options; either 1) add new columns for each child, or 2) repeat the parent information for each child. If the designer opts to add new columns, a number of columns to add must then be determined. However, this also presents a problem. If columns are defined, for example, for up to ten children, the file will not fully accommodate information for parents with more than ten children, and records for those who have fewer than ten children will still require the same amount of storage. If, on the other hand, the parent information is repeated by adding more rows, storage is wasted for each duplicated parent value. Obviously, neither option provides a complete solution.
The other predominant data storage methodology is known generally as “Multivalue” storage. Multivalue database systems (formerly known as Pick©-compatible systems; named after Richard Pick, the commonly accepted founder of the Multivalue technology) overcome the weaknesses inherent in the relational storage model. First, information stored in a Multivalue file is dynamic—that is, each record grows and/or shrinks based on the information to be stored. Unlike a relational file, which requires each record to be discretely defined at the time of file creation, a Multivalue file has no such restrictions. Instead, a file can be created, fields of any length can be added to records and textual records of any length or structure can be added to the file at any time.
Also unlike the relational methodology, the Multivalue methodology allows data to be multivalued—that is, multiple values can be stored at each intersection of column and row. Additionally, each value in a multivalued field can contain any number of subvalues, thus allowing the construction of a three-dimensional record of fields (more commonly known as attributes) containing multivalues, each multivalue potentially containing multiple subvalues.
Using the parent/child example from above, this information could be stored using the Multivalue methodology with much less overhead than with the relational methodology. Records stored in a Multivalue file might appear something like this:
Joe Smith{circumflex over ( )}Sally Smith
Bob Thomas{circumflex over ( )}Jim Thomas]Jack Thomas
Fields in a Multivalue record have no specific starting and ending positions, nor specific length, as do their relational counterparts. Instead, the record contains certain characters that are used to separate, or delimit, each field. In the above example, the caret represents an attribute mark, which separates individual fields in the record. In the second example, the bracket character represents a value mark, which separates the individual multivalues in the field. Though not shown in this example, a subvalue mark could also be used to further divide each multivalued field.
Unlike the relational methodology, which stores information in memory and on persistent storage using virtually identical structures, the Multivalue methodology uses hashing and framing techniques when organizing the information on persistent storage. Essentially, each Multivalue file is divided into a series of groups, each group comprising any number of frames, or areas of persistent storage. In order for a record to be written to a particular group, a primary key is hashed (used in a calculation) to determine the appropriate group where the record should be stored. This particular combination of techniques is very effective in providing quick access to any record in the file, with certain limitations, discussed below.
While the Multivalue storage and retrieval methodology has advantages over the relational method, it is also problematic. First and foremost, because certain characters are used to delimit the attributes, values, and subvalues in a record, these characters cannot be contained in the data itself without compromising the structure of the record. Second, because there are no predefined field widths (as there would be with the relational model), there is no way to calculate the position of a given field in the record. Therefore, to extract a field from a record, the record must be scanned from the top, counting delimiters until the desired field is reached. This, therefore, causes the performance at the bottom of the record to be degraded in comparison to the performance at the top of the record. As the record grows, the degradation becomes more significant.
Additionally, while framing and hashing work effectively to provide quick access to records in the file, all known implementations of the Multivalue methodology force a frame to be a certain length, such as 512, 1K, 2K, or 4K. This introduces an inefficiency that is common
Costa Jessica
Dodds, Jr. Harold E.
Law Offices of Jessica Costa, PC
Precision Solutions, Inc.
Robinson Greta
LandOfFree
Software based methodology for the storage and retrieval of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Software based methodology for the storage and retrieval of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Software based methodology for the storage and retrieval of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3081998