Parallel index maintenance

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06438562

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to database management systems and more particularly to updating indexes in response to parallel execution of data manipulation operations.
BACKGROUND OF THE INVENTION
To fully utilize the computing power of a multi-processing system, a larger task (a “parent task”) may be divided into smaller tasks (“work granules”) which are then distributed to processes (“slaves”) running on one or more processing nodes. The slaves execute their assigned work granules in parallel with the other slaves, causing the parent task to complete faster than if it were executed by a single process. Each node may contain multiple processors and multiple concurrent processes. The process that divides parent tasks into work granules and distributes the work granules to processes on the various processing nodes is referred to herein as the coordinator process or “master”.
MULTI-PROCESSING SYSTEMS
Multi-processing computer systems typically fall into three categories: shared everything systems, shared-disk systems, and shared-nothing systems. The constraints placed on the coordinator process during the work granule distribution process vary based on the type of multi-processing system involved. In shared-everything systems, processes on all processors have direct access to all dynamic, i.e., volatile, memory devices (hereinafter generally referred to as “memory”) and to all static, i.e., persistent, memory devices (hereinafter generally referred to as “disks”) in the system.
In shared-disk systems, processors and memories are grouped into nodes. Each node in a shared-disk system may itself constitute a shared-everything system that includes multiple processors and multiple memories. Processes on all processors can access all disks in the system, but only the processes on processors that belong to a particular node can directly access the memory within the particular node.
In shared-nothing systems, all processors, memories and disks are grouped into nodes. In shared-nothing systems as in shared-disk systems, each node may itself constitute a shared-everything system or a shared-disk system. Only the processes running on a particular node can directly access the memories and disks within the particular node. Of the three general types of multi-processing systems, shared-nothing systems typically require the least amount of wiring between the various system components.
FIG. 1
illustrates an example shared-nothing multiprocessor system with four nodes
110
including three banks of disks
150
. Disk banks
151
,
152
and
153
are local to nodes
111
,
112
and
113
, respectively. A coordinator process, master
120
running on node
113
, has spawned four slaves, slave W
131
, slave X
132
, slave Y
133
and slave Z
134
running on nodes
111
,
112
,
113
and
114
, respectively.
SHARED DISK/SHARED NOTHING DATABASES
Databases that run on multi-processing systems typically fall into two categories: shared-disk databases and shared-nothing databases. A shared-disk database expects all disks in the computer system to be visible to all processing nodes. Consequently, a coordinator process in a shared-disk database may assign any work granule to a process on any node, regardless of the location of the disk that contains the data that will be accessed during the work granule.
Shared-disk databases may be run on both shared-nothing and shared-disk computer systems. To run a shared-disk database on a shared-nothing computer system, as shown in
FIG. 1
, software support may be added to the operating system or additional hardware may be provided to allow processes to have direct access to remote disks. For example, software support may allow slave W
131
on node
111
to have direct access to disk bank
153
on node
113
.
In general, however, a node's access to its local disks may be more efficient than its access to remote disks. For example, node
111
's access to disk bank
151
is more efficient than node
111
's access to disk bank
153
. A node is said to have an “affinity” for the data stored on the node's local disks.
RELATIONAL STORAGE
Relational databases store information in indexed tables that are organized into rows and columns.
FIG. 2
illustrates a sample table for an example relational database. Each row as
210
in the table
200
represents an individual record. Each column
220
in the table represents a different kind of information or “attribute” that is of interest for all the records. For example, Table Emp
200
stores employee records which contain columns
220
that correspond to the following attributes: employee name (Empname), employee number (Empno), social security number (SSN), department number (Deptno), Salary, job title (Title), date of employment (Startday), etc., in columns
221
,
222
,
223
,
224
,
225
,
226
and
227
, respectively. Each row
210
in the Table Emp
200
stores the same attributes for each individual employee, one attribute per column
220
, but the values of an attribute stored in a column
220
may change. In this example, an employee record is uniquely identified by a social security number, SSN, column
223
, or an employee number, Empno, column
222
. In tables where none of the attributes are unique, a unique attribute called a ROWID may be generated for the record by the database. A user retrieves information from the tables by entering a request that is converted to queries by a database application program, which then submits the queries to a database server. In response to the queries, the database server accesses the tables specified by the query to determine which information within the tables satisfies the queries. The information that satisfies the queries is then retrieved by the database server and transmitted to the database application and ultimately to the user.
In a typical database system, data is stored in a table in an unordered form. As records are entered into a table, they are inserted into the next available location in a non-volatile, i.e., persistent, storage device, such as a fixed disk drive or optical device. Such a location can often be, relative to the location of the previous record, at a non-contiguous storage sector of the persistent storage device. Over time, as records are added or dropped, the physical arrangement of the data in the persistent storage device usually does not correspond to the order of values in any of the attributes of the table. The data for consecutive rows may appear to be randomly spread over a number of blocks in the persistent storage device. Consequently, it is not always possible to directly access or retrieve the record or range of records that satisfy a given set of search criteria. For example, in order to locate all rows in a table that have a given value in a column A, every row of the table must be fetched and column A of each row examined. Even when a row with the target value in column A is found, the remainder rows in the table must be fetched and column A examined, unless the values in column A are established to be unique.
Another problem associated with data retrieval is that, typically, data for a particular row is stored in one or more units of contiguous blocks in a persistent storage device. A block is the smallest quantity of data that can be read from a persistent store into dynamic memory. If a database system requires any information stored in a particular block, the database system must read the entire block into memory. To retrieve values for a target column of a table, the database system must read the entire block or all the blocks that have any data from that column of the table, rather than reading only that portion of the block or blocks that contain values from the target column of the table. Since values for the target column may be present in all or almost all the blocks of a table, the entire table or significant portion thereof must be read into memory in order to retrieve the column values. In such a case, the database server, in response to a query, performs a “full table scan

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Parallel index maintenance does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Parallel index maintenance, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parallel index maintenance will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2916119

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.