Cache sensitive search (CSS) tree indexing system and method

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06711562

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to indexing techniques for searching computer system main memories, and more specifically, to indexing structures relating to searching of databases and arrays.
BACKGROUND OF THE INVENTION
As random access memory becomes less expensive, it becomes increasingly more affordable to build computer systems with large memory systems, and in particular, large main memory systems. Within the next ten years, it may be possible to build computer systems having a terabyte of main memory serving as a buffer for a 100-terabyte database. As a result, all but the largest database tables could reside in main memory. Nevertheless, data processing performance in main memory cannot be increased indefinitely simply by increasing main memory size, even with the use of a cache memory subsystem. Therefore, it is becoming increasingly more important to improve cache memory strategy during main memory data processing operations.
Index structures are a significant factor in main memory database system performance, and can be used to reduce overall computation time without consuming very much additional memory space. In sufficiently large memory systems, most indexes can be memory resident. Indexing in main memory databases, and the performance measurements associated therewith, have been addressed in the literature, for example, in J. Lehman and Michael J. Carey, A Study of Index Structures for Main Memory Database Management Systems, Proceedings of the 12
th
VLDB Conference, pages 294-303, 1986, and in Kyu-Young Whang and Ravi Krishnarnurthy, Query Optimization in a Memory-Resident Domain Relational Calculus Database System, ACM Transactions on Database Systems, 15(1): 67-95, 1990, the contents of which are incorporated herein by reference. In recent years, central processor unit (CPU) speeds have increased at a much faster rate than have memory access speeds. As a result, the relative cost in time of cache misses has increased substantially. For this reason, the relative performance advantages of certain prior art indexing methods may no longer be applicable.
Another recent development relevant to the optimal selection of index structures has been the increased interest in On-Line Analytical Processing (OLAP). The respective data processing requirements of OLAP systems and of On-Line Transaction Processing (OLTP) systems are addressed in Clark D. French, “One Size Fits All” Database Architectures Do Not Work for DDS, Proceedings of the ACM SIGMOD Conference, page 449-450, 1995, and in Clark D. French, Teaching an OLTP Database Kernel Advanced Data Warehousing Techniques, Proceedings, IEEE Int'l Conf. On Data Eng., 1997, the contents of which are incorporated herein by reference. The performance of a typical OLAP system can be enhanced by improving query performance, even if at the expense of update performance. Certain commercial systems designed for such purposes include Sybase IQ, which is described in Sybase Corporation, Sybase IQ 11.2.1, 1997, the contents of which is incorporated herein by reference. OLAP system performance can be improved in this way because typical OLAP application workloads are query-intensive, but require infrequent batch updates. For example, in applications involving census data, large quantities of data is collected and updated periodically, but then remain static for relatively long periods of time. In contrast, a typical university's data warehouse containing student records may be updated daily. Certain systems in which data remain static for relatively long periods of time presently may be on the order of several gigabytes in size, and therefore can be stored within present day main memory systems. Because updates in such systems are typically relatively infrequent and are batched, the performance associated with incremental updates of indexes in these systems may not be critical. In fact, it may even be efficient to reconstruct indexes entirely from scratch after relatively infrequent batch updates, if such an approach leads to improved query performance.
Generally speaking, the two most important criteria in selecting particular index structures are the amount of available memory space, and query performance. Because memory space normally is a critical factor in constructing main memory databases, there typically is limited space available for precomputed structures such as indexes. In addition, given a particular set of memory space constraints, the objective is to minimize the time required to perform index lookups. In main memory databases, an important factor influencing the speed of database operations is the degree of locality for the data references for the particular algorithm being run. Superior data locality leads to fewer cache misses, and therefore improved performance.
As is well known in the art, cache memories normally are fast static random access memories (RAM) that improve computer system performance by storing data likely to be accessed by the computer system. Memory references which can be satisfied by the cache are known as hits, and proceed at processor speed. Those memory references which are not found in the cache are known as misses, and result in a cache miss time penalty in the form of a fetch of the corresponding cache block from main memory. Caches are normally characterized by their capacity, block size and associativity. Capacity refers to the cache's overall memory size; block size is the size of the basic memory unit which is transferred between cache and main memory; and associativity refers to the number of locations in the cache which are potential destinations for any single main memory address.
Typical prior art cache optimization techniques for data processing applications include clustering, compression and coloring as set forth in Trishul M. Chilimbi, James R. Larus, Mark D. Hill, Improving Pointer-Based Codes Through Cache-Conscious Data Placement, Technical Report 98, University of Wisconsin-Madison, Computer Science Department, 1998, the contents of which is incorporated herein by reference. Clustering techniques attempt to pack into a cache block data structure elements which are likely to be accessed successively. Compression attempts to remove irrelevant data from the cache and thus increase cache block utilization by enabling more usable data structure elements to be placed in the cache blocks. Compression includes key compression, structure encodings such as pointer elimination, and fluff extraction. Coloring techniques map contemporaneously-accessed data structure elements onto non-conflicting regions of the cache. Caches inherently have such conflicting regions because they have finite levels of associativity, which results in only a limited number of concurrently accessed data elements being able to be mapped to the same cache line without generating a conflict.
Certain prior art has proposed improvements in cache performance using these techniques. For example, in Michael E. Wolf, et al., A Data Locality Optimizing Algorithm, SIGPLAN Notices, 26(6):30-44, 1991, cache reference locality is exploited in an effort to improve the performance of matrix multiplication. In Anthony LaMarca, et al., The Influence of Caches on the Performance of Sorting, Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 1997, the effects of caches on sorting algorithms is considered, and performance was improved by restructuring these algorithms to exploit cache characteristics. In addition, there was constructed a cache-conscious heap structure which clustered and aligned heap elements with cache blocks. In Trishul M. Chilimbi, James R. Larus. and Mark D. Hill, Improving Pointer-Based Codes Through Cache-Conscious Data Placement, Technical Report 98, University of Wisconsin-Madison, Computer Science Department, 1998, it was demonstrated that cache optimization techniques can be used to improve the spatial and temporal locality of pointer-based data structures. In Chris Nyberg, et al., Alphasort: A RISC Machine Sort, Proceedings of the ACM SIGMOD Conference, pages

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cache sensitive search (CSS) tree indexing system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cache sensitive search (CSS) tree indexing system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache sensitive search (CSS) tree indexing system and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3258207

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.