Reduced memory row hash match scan join for a partitioned...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S215000, C711S153000

Reexamination Certificate

active

06772163

ABSTRACT:

BACKGROUND
Data organization is important in relational database systems that deal with complex queries against large volumes of data. Relational database systems allow data to be stored in tables that are organized as both a set of columns and a set of rows. Standard commands are used to define the columns and rows of tables and data is subsequently entered in accordance with the defined structure. The defined table structure is logically maintained, but may not correspond to the physical organization of the data. For example, the data corresponding to a particular table may be split up among a number of physical hardware storage facilities.
Users in relational database systems require a minimum time possible for execution of complex queries against large amounts of data. Different physical types of storage, for example random access memory and hard drives, incur different length delays. In addition, writing to memory or a hard drive is often slower than reading an equivalent amount of data from memory or a hard drive. The organization of data corresponding to tables defined in a relational database system may determine the number of writes and reads that need to be performed in order to execute a common query. If the data is properly organized, in responding to queries performance can be improved by taking advantage of that organization and searching only part of the data. If the data is not organized in any way, it will often need to be searched in its entirety to satisfy a query or copied and restructured into a useful organization.
Given a particular change in the organization of data, particular types of searches or other operations performed on the data may be adversely impacted in terms of efficiency if they are performed without any adjustment. Many factors must be addressed to adjust a search that is to be performed with respect to a new organization of data. Such factors include, but are not limited to, the manner in which the data is stored, the file system that identifies the location of the data and various other information about the data, and the desired outcome of the search. Failure to consider and address any one of those factors can result in an inefficient search.
SUMMARY
In general, in one aspect, the invention features a method for selecting rows from first and second tables each having rows containing values in columns. In at least the first table, the rows are divided into partitions at least one of which is populated by one or more rows. The method includes (a) defining a subset of the populated partitions of the first table that excludes at least one populated partition of the first table, (b) creating a file context, which stores at least location data for a row and a first value associated with the row, for each populated partition in the subset of the populated partitions of the first table, (c) determining the lowest first value stored by the file contexts for the first table, (d) identifying rows with a particular first value by at least reading the file contexts of the first table, and (e) repeating a through d until the subsets of the populated partitions of the first table have included all the populated partitions of the first table.
Implementations of the invention may include one or more of the following. Defining a subset may include calculating a total number of file contexts for both tables. The rows of the second table may be divided into partitions. The method may include (a′) defining a subset of populated partitions of the second table, (b′) creating a file context, which stores at least location data for a row and a first value associated with the row, for each populated partition in the subset of the populated partitions of the second table, (c′) determining the lowest first value stored by the file contexts for the second table, (d′) identifying rows with a particular first value by at least reading the file contexts of the second table, and (f) repeating a through e and a′ through d′ until the subsets of the populated partitions of the first table have included all the populated partitions of the first table, and where (e) may include repeating b through d and a′ through d′ until the subsets of the populated partitions of the second table have included all the populated partitions of the second table.
Creating a file context may include changing the location data and first value to correspond to a row in a different partition. Rows may be stored in order of their corresponding first value within the partitions. The first value corresponding to a row may be the result of a hash function applied to the values in one or more columns.
Defining a subset of the populated partitions of the first table may include (i) representing a total read time for the first and second tables in terms of a variable representing the number of partitions in a subset of the partitions of the first table, (ii) determining the rate of change in total read time in terms of the number of partitions in a subset, (iii) truncating the number of partitions for which the rate of change in total read time is zero, and (iv) increasing the number of partitions to one if truncation results in a value of zero. Defining a subset of the populated partitions of the first table may further include (v) if truncation results in a value greater than zero, determining whether the read cost for a subset including an additional partition is less than the read cost for the current number of partitions and increasing the number by one if the read cost with the additional partition is lower.
A number, f
1
, of populated partitions in a subset of the partitions of the first table may be determined in accordance with the equation f
1
=(fT*R/(1+R)) where fT is a total number of file contexts for both tables and R={square root over ((r
2
/r
1
)*(p
1
/p
2
))}, where r
1
and r
2
represent a cost to read once through tables 1 and 2, respectively, and p
1
and p
2
represent the number of populated partitions of tables 1 and 2, respectively. Alternatively, R may be calculated using R={square root over ((db
2
/db
1
)*(p
1
/p
2
))}, where tables 1 and 2 require db
1
and db
2
data blocks of storage, respectively.
In general, in another aspect, the invention features a database system for iteratively selecting rows from a first table. The database system includes a second table. The first table includes rows and columns and is divided by rows into partitions. At least one of the partitions in the table is populated by one or more rows. The system includes one or more nodes, a plurality of CPUs, each of the one or more nodes providing access to one or more CPUs, and a plurality of processes, each of the one or more CPUs providing access to one or more virtual processes. Each process is configured to manage data, including the partitioned database table, stored in one of a plurality of data-storage facilities. A partitioned table access component is configured to select rows from at least the first table by (a) defining a subset of the populated partitions of the first table that excludes at least one populated partition of the first table, (b) creating a file context, which stores at least location data for a row and a first value associated with the row, for each populated partition in the subset of the populated partitions of the first table, (c) determining the lowest first value stored by the file contexts for the first table, (d) identifying rows with a particular first value by at least reading the file contexts of the first table, and (e) repeating (a) through (d) until the subsets of the populated partitions of the first table have included all the populated partitions of the first table.
In general, in another aspect, the invention features a computer program, stored in a tangible medium, for selecting rows from a first table. The first table has rows and columns and is divided by row into partitions. At least one of the partitions is populated by rows. The program includes executable instructions that ca

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Reduced memory row hash match scan join for a partitioned... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Reduced memory row hash match scan join for a partitioned..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reduced memory row hash match scan join for a partitioned... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3321382

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.