Histogram-based approximation of set-valued query-answers

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06507840

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of databases. More particularly, the present invention relates to a method for generating histogram-based approximations in response to complex queries to a database.
2. Description of the Related Art
Extremely complex queries are posed to Database Management Systems (DBMSs) through software applications, such as decision support applications and experiment management applications. Given the exploratory nature of such queries, many of the queries require a significant amount of time to execute and then produce results that may be of no particular interest, even though the results are accurate. Answering such queries approximately has been proposed as a technique for reducing query response times when a precise answer is not necessary or when early feedback is helpful. Time can be saved when an approximate answer can be rapidly generated so that a decision can be made based on the approximate answer whether the complete execution of a query should be completed.
An approximate answer to a query is easily conceptualized when the answer to the query is, for example, an image. Instead of returning the actual image as the answer to the query, a compressed version of the image can be returned as an approximate answer. Alternatively, a series of compressed images can be retrieved, with each successive image being compressed less (more accurate) than the previously retrieved image, and with the last image retrieved being the actual image. This particular approach is provided by many existing systems, including web browsers, such as Netscape Navigator.
There has been extensive work in connection with approximate query answering ranging from theoretical foundations, to actual systems, to using query generalization for obtaining non-empty query answers. For background regarding established theoretical foundations, see, for example, P. Buneman et al., A semantics for complex objects and approximate queries, Proc. 7th ACM SIGMOD-SIGACT Symposium on Principles of Database Systems, pp. 305-314, April 1988, which is incorporated by reference herein. For background regarding actual systems, see, for example, Ozsoyoglu et al., Processing real-time non-aggregate queries with time-constraints in CASE-DB, Proc. 8th International Conference on Data Engineering, pp. 140-147, Tempe, Ariz., February 1992; and S. Vrbsky et al., APPROXIMATE: A query processor that produces monotonically improving approximate answers, IEEE Trans. on Knowledge and Data Engineering, 5(6), December 1993, each of which is incorporated by reference herein. For background regarding query generalization for obtaining non-empty query answers, see, for example, A. Motro, Query generalization: A method for interpreting null answers, L. Kerschberg, editor, Expert Database Systems, Proceedings from the First Inter-national Workshop, Benjamin/Cummings, Inc., Menlo Park, Calif., pp. 597-616, 1986, which is incorporated by reference herein. All of this work has been based on a subset/superset definition for approximations that has been obtained mostly through a partial query process.
Online aggregation is another technique used for approximate query answering, but is applicable only for queries that return aggregates, that is, individual values, not sets. The focus of online aggregation techniques has been to efficiently compute aggregates in an online fashion using an interactive interface providing continuous feedback relating to the expected distance of the current aggregate approximation from its actual value. Online aggregation has also been based on defining approximations as subset and/or supersets of an actual answer. For background regarding aggregate queries, see, for example, J. M. Hellerstein et al., Online aggregation, Proc. ACM SIGMOD Conference on the Management of Data, pp. 171-182, Tucson, Ariz., June 1997, which is incorporated by reference herein. For background regarding approximations as subsets and/or supersets of an actual answer, see, for example, P. Buneman et al., supra.
There has also been a considerable amount of work using statistical techniques for approximating data in databases, particularly in the context of selectivity estimation in query optimizers. Three widely studied classes of statistical techniques are sampling techniques, parametric techniques (approximating the data using a mathematical distribution), and histogram (or non-parametric) techniques.
For additional background regarding sampling techniques, see, for example, P. J. Haas et al., Sampling-based estimation of the number of distinct values of an attribute, Proc. of the 21st Int. Conf on Very Large Databases, pp. 311-322, 1995; R. J. Lipton et al., Practical selectivity estimation through adaptive sampling, Proc. of ACM SIGMOD Conf., pp. 1-11, May 1990; and S. Seshadri et al., Sampling issues in parallel database systems, Extending Database Technology (EDBT), pp. 328-343, March 1992, each of which is incorporated by reference herein. For background regarding parametric techniques, see, for example, C. M. Chen et al., Adaptive selectivity estimation using query feedback, Proc. of ACM SIGMOD Conf., pp. 161-172, May 1994, which is incorporated by reference herein. For background regarding histogram techniques, see, for example, Y. Ioannidis, Universality of serial histograms, Proc. of the 9th Int. Conf on Very Large Databases, pp. 256-267, December 1993; Y. Ioannidis et al., Optimal histograms for limiting worst-case error propagation in the size of join results, ACM TODS, 1993; Y. Ioannidis et al., Balancing histogram optimality and practicality for query result size estimation, Proc. of ACM SIGMOD Conf., pp. 233-244, May 1995; RP. Kooi,. The optimization of queries in relational databases, Ph.D. thesis, Case Western Reserve University, September 1980; M. V. Mannino et al., Statistical profile estimation in database systems, ACM Computing Surveys, 20(3):192-221, Sept 1988; M. Muralikrishna et al., Equi-depth histograms for estimating selectivity factors for multi-dimensional queries, Proc. of ACM SIGMOD Conf., pp. 28-36, 1988; G. Piatetsky-Shapiro et al., Accurate estimation of the number of tuples satisfying a condition, Proc. of ACM SIGMOD Conf , pp. 256-276, 1984; and V. Poosala et al., Improved histograms for selectivity estimation of range predicates, Proc. of ACM SIGMOD Conf, pp. 294-305, June 1996, each of which is incorporated by reference herein.
Of these particular techniques, histograms are probably the most widely used statistical technique in commercial database systems. For example, histograms are used in the DBMSs of Oracle, Sybase, Microsoft and IBM, because histograms occupy small amounts of space, do not incur significant overhead at estimation time and histograms are particularly suited for accurately approximating the skewed distributions arising in real-life.
Previously, several classes of histograms for building on one or more attributes have been identified. Additionally, techniques have been proposed for incrementally maintaining many of the classes of histograms up-to-date as the database is updated. For background regarding classes of histograms, see, for example, V. Poosala et al., Improved histograms for selectivity estimation of range predicates, Proc. of ACM SIGMOD Conf., pp. 294-305, June 1996; and V. Poosala et al., Selectivity estimation without the attribute value independence assumption, Proc. of the 23rd Int. Conf on Very Large Databases, August 1997, each of which is incorporated by reference herein. For background regarding techniques for maintaining many of the novel classes of histograms incrementally, see, for example, Phillip B. Gibbons et al., Fast incremental maintenance of approximate histograrms, Proc. of the 23rd Int. Conf on Very Large Databases, August 1997, which is incorporated by reference herein. Except for sampling none of the other techniques, however, have been previously studied in the context of approximate query answering.
Nevertheless, such conventional approaches are not particularly u

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Histogram-based approximation of set-valued query-answers does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Histogram-based approximation of set-valued query-answers, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Histogram-based approximation of set-valued query-answers will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3028117

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.