Method and apparatus for populating sparse matrix entries...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C709S201000, C709S203000, C711S117000, C713S152000, C714S100000

Reexamination Certificate

active

06212515

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to databases and in particular to data aggregation on demand.
In the field of data warehousing, members of an organization will frequently need to summarize, or aggregate, vast quantities of data at various different levels of summarization within various stored dimensions.
A level of data in a dimension of a database is a grouping of the entries in that dimension. For example, if a dimension consists of different stores, the stores could be grouped at a city level, a state level, or any other conceivable level.
Different people will frequently wish to obtain the same summarized data at different times. Rather than querying the data in the data warehouse and aggregating the data every time aggregated data is required, it has become common practice to pre-aggregate data at various commonly queried levels of aggregation and store the aggregated data in the database for easy retrieval in what is referred to as a partition. The levels which are pre-aggregated and stored in this fashion are often defined by the system administrator. When further aggregations are performed at the request of a user, these might also be stored in a partition to be retrieved at a later date.
A problem with such storage of partitioned data is that when a request is made for data at a certain level aggregation which has not already been generated, the aggregation must be carried out using detail level data, despite the fact that the data might actually be stored at a hierarchically marginally lower level, and easily be aggregated further to the level in question. Levels are hierarchically above or below one another if all the detail records associated with any particular member of the lower level are also associated with a single member of the higher level. For example, a city level grouping of stores is hierarchically below a state level grouping of stores, as all stores in a particular city will also be in a particular state (assuming no cities lie across a state border). However, a “store size” level will normally not be hierarchically above or below a state level, as there is no reason for there to be a correlation between store size and state.
To overcome the problem of always having to resort to the detail level data to perform a new aggregation, it has become common practice to define virtual partitions for intersections which do not actually store data at a particular level, but instead present the outward appearance of holding the data at a certain level of aggregation, and are provided with the functionality to aggregate data from one or more lower level physical partitions which are stored. The virtual partitions are pre-programmed as to which physical partition data to use for aggregation.
However, for systems with a large number of dimensions and a significant number of levels in each dimension, the number of virtual partitions which must be defined can be prohibitive—both from an administrative perspective and a search/query perspective. If a virtual partition is to be maintained for each possible set of levels across all the dimensions, the number of virtual partitions which require storing is the product of the number of levels in each dimension. For example, a system with 6 dimensions each with 5 levels and 2 dimensions each with 7 levels results in a matrix of 765,625 cells.
What is required is a system to allow the server handling queries to choose the best partition from which to aggregate data “on the fly” without maintaining a database of appropriate partitions to aggregate from.
SUMMARY OF THE INVENTION
The present invention provides a method for creating aggregations to higher levels using aggregations over the same dimensions already created at lower levels. A system is provided which allows a measure of processing “cost” to be ascertained for each physical partition of the data stored to satisfy the request, and the aggregation is then performed using the partition which is ascertained to have the lowest cost. The detail level data may be classified as a partition. Otherwise, if no partition can provide the data at the required level, or if the lowest cost is greater than the cost of performing the aggregation from the detail level data, the aggregation is performed using the detail level data.
According to one aspect of the invention, an object is associated with each stored aggregation of the stored data. The object can be polled based on a set of levels which are required in a set of dimensions. The object then responds to this query with a bid representing the amount of processing which will be required to aggregate the data it is associated with to the required level. The requesting object will ask the aggregate object returning the lowest bid to perform the aggregation and return the appropriate aggregate data. Equivalent operation could be achieved by storing the meta-data defining the data stored in the physical partitions in an appropriate data structure, and polling this data structure to ascertain which partition to use to perform the aggregation.
According to another aspect of the invention, the bid returned is calculated by multiplying together the “distance” for each required dimension, where distance is the ratio of the number of entries in the requested level to the number of entries in the stored level.
According to another aspect of the invention, the bid returned is calculated by calculating the difference between the estimated or actual number of entries in the stored data and the estimated number of entries which will be present in the aggregated data.


REFERENCES:
patent: 5551027 (1996-08-01), Choy et al.
patent: 5675785 (1997-10-01), Hall et al.
patent: 5710915 (1998-01-01), McElhiney
patent: 5713020 (1998-01-01), Reiter et al.
patent: 5761652 (1998-06-01), Wu et al.
patent: 5765147 (1998-06-01), Mattos et al.
patent: 5799300 (1998-08-01), Agrawal et al.
patent: 5822518 (1998-10-01), Ooki et al.
patent: 5822751 (1998-10-01), Gray et al.
patent: 5835712 (1998-11-01), DuFresne
patent: 5940818 (1999-08-01), Malloy et al.
patent: 5974418 (1999-10-01), Blinn et al.
patent: 5978788 (1999-11-01), Castelli et al.
patent: 5978796 (1999-11-01), Malloy et al.
patent: 5991754 (1999-11-01), Raitto et al.
patent: 6003024 (1999-12-01), Bair et al.
patent: 6003036 (1999-12-01), Martin
Albrecht, J. et al., “Management of Multidimensional Aggregates for Efficient Online Analytical Processing,” Proceedings of the 1999 International Symposium on Database Engineering and Applications, IDEAS '99, Aug. 2-4 1999, pp. 156-164.*
Krippendorf, Michael et al., “The Translation of Star Schema into Entity-Relationship Diagrams,” Proceedings of the Eighth International Workshop on Database and Expert Systems Applications, Sep. 1-2, 1997, pp. 390-395.*
Platinum Technology, “InfoBeacon & Relational OLAP What's Inside of InfoBeacon, and Why You Should Care,” Jun. 1997.
Gupta, “Aggregate-Query Processing in Data Warehousing Environments,” Zurich, Switzerland 1995, Proceedings of the 21st VLDB Conference.
Database Programming & Design, Divide and Aggregate: Designing Large Warehouses, Jul.,1998Mar. 8, 2000.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for populating sparse matrix entries... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for populating sparse matrix entries..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for populating sparse matrix entries... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2470444

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.