Electrical computers and digital processing systems: memory – Storage accessing and control – Memory configuring
Reexamination Certificate
2000-04-21
2002-06-25
Hudspeth, David (Department: 2651)
Electrical computers and digital processing systems: memory
Storage accessing and control
Memory configuring
C711S004000, C711S114000, C707S793000
Reexamination Certificate
active
06412054
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a method of declustering data among a number of data storage units or disks.
2. Discussion of the Known Art
As the density and capacity of information storage media increase, the need of modern scientific and business applications for mass data storage also grows. For example, data sensed by earth orbiting satellites is typically collected and stored in remote-sensing databases. In the year 2000, United States satellites alone are expected to transmit about one terabyte of data to earth daily for storage.
While media for accommodating terabytes of data storage are now available, efficient retrieval of the stored data often remains a problem. Specifically, current disk storage technology has inherent mechanical limitations that tend to limit the speed and the efficiency with which data can be retrieved from a number of storage disks, each of which has a minimum seek time. One approach to improve input/output storage speed (bandwidth) is to distribute data blocks among multiple disks, so that the blocks can be retrieved in parallel, i.e., during overlapping disk seek periods.
If data blocks of files are partitioned and stored among a number of disk storage units, then an efficient declustering scheme is important to ensure that the data blocks can be retrieved quickly from all units in which blocks of a requested file are stored. Satisfactory declustering schemes ensure that data blocks requested by a given query will in fact have been initially distributed to different disks.
Consider multidimensional data that is organized as a uniform data grid, for example, geospatial data that can be mapped in a two-dimensional raster format. In this format, a dataset is divided into columns and rows of “tiles”, each of which is assigned to and stored as a separate data block on one of a number of disks. Spatial range queries against the dataset are expressed as rectangular clips on the map. Thus, all tiles within the boundary of a rectangle corresponding to a given query must be retrieved. In a multi-disk system, the response time for the query is therefore influenced mostly by the access time of that disk which stores the most tiles within the rectangle.
Various declustering schemes have been proposed to reduce response time for multidimensional queries, including some proposed specifically for uniform data grids. Other known schemes are more suitable for non-uniform data grids.
Problem Definition
Consider a dataset organized as a grid of N
x
×N
y
tiles (i.e., N
x
columns and N
y
rows). Given M disks, a declustering scheme, s: N
1
×N
2
→M, assigns tile (x,y) to the disk numbered s(x,y). A “range query” is a query that retrieves a rectangular set of tiles contained within the grid. Let tile
i
(s,Q), i=0, 1, . . . ,M−1, represent the number of tiles in Q that get assigned to disk i under declustering scheme s (parameter s is omitted when clear from the context). Define the (nominal) response time of query Q under scheme s to be RT(s,Q)=max
0≦i<M
tile
i
(s,Q) . A unit of response time may be considered the average disk access time (including seek, rotational, and transfer time) to retrieve a data block. Thus, the notion of response time indicates an expected I/O delay for answering a given query. The problem, therefore, is how to devise a declustering scheme that minimizes the query response time.
An ideal declustering scheme would achieve, for each query Q, an optimal response time ORT(Q)=┌|Q|/M┐, where |Q| is the number of tiles in Q. It has been shown that such a scheme, known as the Strictly Optimal (SO) scheme, does not exist in general, except for a few stringent cases. The SO scheme thus serves as a lower bound against which performance of other schemes may be measured.
An early scheme proposed for data declustering is the Disk Modulo (DM) scheme, which assigns tile (x,y) to disk number (x+y) mod M. A known Field Exclusive-Or (FX) scheme assigns tile (x,y) to disk number dec(bi(x) ⊕ bi(y))mod M. Here, bi(x) is the binary representation of x, and dec(z) is the decimal number corresponding to the binary representation z.
A so-called Hilert Curve Allocation Method (HCAM) imposes a linear ordering on the tiles in the grid, and assigns disk numbers 0,1, . . . M−1 to the tiles in a round robin fashion according to the linear ordering. HCAM has been shown to yield better performance than previously known schemes, including DM and FX.
Recently, it was shown that a family of Cyclic Declustering (CD) schemes, a simple generalization of the Disk Modulo scheme, outperforms HCAM. Given M disks, the CD schemes assign tile (x,y) to disk numbered (x+H·y) mod M, where H, called the skip value, is a predetermined parameter. To find a good skip value (or good skip values for multidimensional data), two heuristics-based schemes were proposed, both of which outperformed HCAM. One scheme (GFIB) is based on Fibonacci numbers, and the other (EXH) is based on an exhaustive search of the best skip values. When M=F
k
is the k-th Fibonacci number, the GFIB scheme uses the skip value H=F
k−1
. The EXH scheme evaluates performance for each possible value of H, and chooses the one with the best performance. The performance is evaluated in terms of “average response time”, which is taken against all queries within an M×M grid.
Extensions of GFIB and EXH for multidimensional data have been given. For d-dimensional data, the CD schemes map tile (x
0
,x
1
, . . . , x
d−1
) to the disk numbered (x
0
+H
1
x
1
+ . . . H
d−1
x
d−
)mod M. When M=F
k
is the k-th Fibonacci number, GFIB uses skip values: H
1
=F
i−1
, H
2
=F
i−2
, . . . H
d−1
=F
i−d+1
. The EXH scheme performs a greedy exhaustive search. First, the best value for H
1
is determined by considering all queries in a M×M grid. Then, fixing H
1
, EXH determines the best value for H
2
(among the candidates 1,2, . . . ,M) by considering all 3D queries in a grid of size M×M×M. The process proceeds toward higher dimensions until all skip values are found.
It has been shown that EXH has average performance close to the Strictly Optimal scheme, and that it outperforms GFIB. EXH requires substantial computational overhead with respect to GFIB in determining the skip values, however. This renders EXH impractical for multidimensional data, and feasible only for small values of M or for small dimension data.
Golden Ratio Sequences (GRS) are highly regular sequences in which elements of certain sets are almost uniformly distributed. The sequences themselves were developed in A. Itai, et al., A Golden Ratio Control Policy For a Multiple-Access Channel, IEEE Trans. on Automatic Control, AC-29:712-718 (1984), and are based on an open addressing hashing method described in D. E. Knuth, The Art of Computer Programming, v.3, Addison-Wesley (1973). The golden ratio sequences have been applied in packet routing, see H. Hofri, et al., Packet Delay Under the Golden Ratio Weighted TDM Policy in a Multiple-Access Channel, IEEE Trans. on Information Theory, 11-33:341-349 (1987), and were recently used in A. Bar-Noy, et al, Minimizing Service and Operations Cost of Periodic Scheduling, 9th Annual ACM-SIAM Symp. on Discrete Algorithms (1998) to develop almost optimal procedures for hard problems such as minimizing mean response time in Broadcast Disks, Teletext Systems, and Maintenance Scheduling problems. As far as is known, however, GRS has not been applied to solving the problem of obtaining an efficient data storage declustering scheme.
SUMMARY OF THE INVENTION
According to the invention, a method of declustering data among a number of storage units to minimize response time for range queries, includes identifying data to be stored in the form of a dataset, wherein data is divided among uniform data blocks (x, y), and each data block is to be stored in one of a nu
Bhatia Randeep S.
Chen Chung-Min M.
Sinha Rakesh K.
Hudspeth David
Law Office of Leo Zucker
Lucent Technologies - Inc.
Tzeng Fred F.
LandOfFree
Storage disk declustering method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Storage disk declustering method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Storage disk declustering method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2961757