Selectivity estimation in spatial databases

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06353832

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to geographic information systems.
BACKGROUND OF THE INVENTION
Various geographic information systems are known in the art. These systems generally store and manage spatial data such as points, lines, poly-lines, polygons, and surfaces and hence are often referred to as spatial databases. Several commercial database systems that manage spatial data are now available, including: ESRI's (Environmental Systems Research Institute), ARC/INFO (trademarked), InterGraph's MGE, MapInfo, and Informix. Query size estimation in spatial databases has been identified as an important problem. An example of a spatial query may be to determine how many rectangles in a spatial database are contained within a rectangular spatial query of a certain size. For example, a query may be to determine how many lakes are within a state. In that case, the lakes are the data rectangles in the spatial database and the rectangular query is the particular state. Similarly one may wish to know how many houses are in a county or how many restaurants are in an area. It may be beneficial to estimate the results of such a query to determine the most efficient way to execute queries generally or to give users estimates of the running times of their queries before the queries are actually executed.
Some query result estimation techniques have been applied to relational databases. A relational database contains non spatial data such as for example numbers, points (points are a special case and may in some cases be classified as spatial data), strings, and dates. These techniques are disclosed in “Balancing Histogram Optimality and Practicality for Query Result Size Estimation”, Yannis E. loannidis and Viswanath Poosala, appeared in Proceedings of ACM SIGMOD (Special Interest Group in Management of Data) conference 1995, and use histograms, samples, or are based on parametric techniques. However, relational selectivity estimation solutions focus on approximating single numerical attributes not on two dimensional spatial data.
Generally a bucket is defined as any subset of input spatial data. A spatial input generally can be defined as an input of spatial entities such as rectangles and triangles. Points can be both spatial data and relational data.
SUMMARY OF THE INVENTION
The present invention provides various methods and apparatus for providing accurate estimates for point and range queries over two-dimensional spatial data. The present invention provides several grouping techniques for the approximating of spatial data.
In one embodiment of the present invention a method is disclosed for grouping a plurality of spatial inputs into a plurality of buckets also called grouping polygons. These buckets may be stored in a memory by storing their left bottom corner coordinates and their right top corner coordinates (for a rectangular bucket). This provides both the shape of a rectangular bucket and its location. In one form of the present invention the plurality of spatial inputs is grouped based on an equi-area partitioning technique. The equi-area partitioning technique can use the longest dimension of a bucket or bounding polygon as the criteria for splitting into further buckets or bounding polygons. An equi-count technique can also be used wherein the buckets are split using the highest projected spatial input count along a dimension as a splitting criteria. The bounding polygons may be a minimum bounding rectangle.
In one form of the present invention a method is provided which uses a grid of regions superimposed over a plurality of spatial inputs. The processor may achieve superimposition by storing the left bottom corner coordinates and the right top corner coordinates of the each region of the grid of regions in memory and storing the left bottom corner and right top corner coordinates of each spatial input in memory. Superimposition occurs because the coordinates of a spatial input and a region of the grid of regions may be the same. The method preferably determines a measure of the density of the spatial inputs within each region of the grid of regions and uses this measurement of density to determine how to group the spatial inputs into buckets.
When a query is received the present invention applies the query to the buckets created and gives an estimate of the number of spatial inputs contained within the query by preferably assuming that spatial inputs are uniformly distributed within each bucket.


REFERENCES:
patent: 5701467 (1997-12-01), Freeston
patent: 5724573 (1998-03-01), Agrawal et al.
patent: 5761652 (1998-06-01), Wu et al.
patent: 5781906 (1998-07-01), Aggarwal et al.
patent: 5963956 (1999-10-01), Smartt
patent: 6003029 (1999-12-01), Agrawal et al.
patent: 6003036 (1999-12-01), Martin
patent: 6052689 (2000-04-01), Muthukrishnan et al.
patent: 6065007 (2000-05-01), Muthukrishnan et al.
patent: 6092072 (2000-07-01), Guha et al.
patent: 6154746 (2000-11-01), Berchtold et al.
patent: 6175829 (2001-01-01), Li et al.
Acharya et al. “Selectivity Estimation in Spatial Databases”, Proceedings of the 1999 ACM SIGMOD Internation Conference on Management of Data , May 31-Jun. 3, 1999, pp. 13-24.*
Belussi et al. “Self-Spatial Join Selectivity Estimation Using Fractal Concepts”, ACM Transactions on Information Systems, vol. 16, No. 2, Apr. 1998, pp. 161-201.*
Poosla et al. “Improved Histograms for Selectivity Estimation of Range Predicates”, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Jun. 3-6, 1996, pp. 294-305.*
Samet, Hanan, The Design and Analysis of Spatial Data Structures, Reading:Addison-Wesley, 1989, pp. 92-115.*
Piatetsky-Shapiro et al., “Accurate Estimation of the Number of Tuples Satisfying a Condition”, Proceedings of the 1984 ACM SIGMOD Conference, 1984, pp. 256-276.*
Kooi, Robert Phili. The Optimization of Queries in Relational Databases, PhD Dissertation, Case Western University, 1980, pp. 83-108.*
Piatetsky-Shapiro et al. “Accurate Estimation of the Number of Tuples Satisfying a Condition”, Proceedings of the 1994 ACM SIGMOD Conference, 1984, pp. 256-276.*
Kooi, Robert Phili, “The Optimization of Queries in Relational Databases”, PhD Dissertation, Case Western University, 1980, Chapter 6 Histograms, pp. 83-108.*
Prior Art—Alberto Bellussi, Christos Faloutsos, “Estimating the Selectivity if Spatial Queries Using the ‘Correlation’ Fractal Dimension”, Feb. 24,1995, pp. 1-26.
Prior Art—Norbert Beckmann et al, “The R-tree: An Efficient and Robust Access Method for Points and Rectangles”, 1990, pp. 322-331.
Prior Art—Antonin Guttman, “R-Trees: A Dynamic Index Structure Spatial Searching”, 1984, pp. 125-135.
Yannis E. Ioannidis, Viswanath Poosala, “Balancing Histogram Optimality and Practicality for Query Result Size Estimation”, pp. 5-10. (Not sure if this is prior art and there is no date).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Selectivity estimation in spatial databases does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Selectivity estimation in spatial databases, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Selectivity estimation in spatial databases will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2852431

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.