Collecting statistics in a database system

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06801903

ABSTRACT:

BACKGROUND
A database is a collection of stored data that is logically related and that is accessible by one or more users. A popular type of database is the relational database management system (RDBMS), which includes relational tables made up of rows and columns (also referred to as tuples and attributes). Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, thing, or other object about which the table contains information.
To extract data from, or to update, a relational table in an RDBMS, queries according to a standard database-query language (e.g., Structured Query Language or SQL) are used. Examples of SQL statements include INSERT, SELECT, UPDATE, and DELETE.
As applications become increasingly sophisticated, and data storage needs become greater, higher performance database systems are used. One such database system is the TERADATA® database mangement system from NCR Corporation. The TERADATA® database systems are parallel processing systems capable of handling relatively large amounts of data. In some arrangements, a database system includes multiple nodes that manage access to multiple portions of data to enhance concurrent processing of data access in updates. In TERADATA® database management systems, concurrent data processing is further enhanced by the use of virtual processors, referred to as access module processors (AMPs), to further divide database tasks. Each AMP is responsible for a logical disk space. In response to a query, one or more of the AMPs are invoked to perform database access, updates, and other manipulations.
One of the goals of a database management system is to optimize the performance of queries for access and manipulation of data stored in the database. Given a target environment, an optimal query plan is selected, the optimal query plan being the one with the lowest cost (e.g., response time) as determined by an optimizer in the database system. The response time is the amount of time it takes to complete the execution of a query on a given system.
The optimizer calculates cost based on statistics of one or more columns (or attributes) of each table. Statistics enable the optimizer to compute various useful metrics. Typically, statistics are stored in the form of a histogram.
In database systems that store large tables, the cost of collecting statistics for such large tables can be quite high. As a result, some database users may choose not to collect statistics for columns of tables over a certain size. The lack of statistics for some tables may adversely affect operation of certain components in the database system, such as the optimizer and other tools.
SUMMARY
In general, a mechanism for faster collection of statistics in a database system is provided. For example, a method for use in a database system comprises receiving a request to collect statistics of at least an attribute of table, and collecting statistics for the attribute based on reading a sample of rows of the table, the sample being less than all the rows of the table.
Other or alternative features will become apparent from the following description, the drawings, and the claims.


REFERENCES:
patent: 5303383 (1994-04-01), Neches et al.
patent: 5625815 (1997-04-01), Maier et al.
patent: 5640584 (1997-06-01), Kandasamy et al.
patent: 5864842 (1999-01-01), Pederson et al.
patent: 5870752 (1999-02-01), Gibbons et al.
patent: 5884299 (1999-03-01), Ramesh et al.
patent: 5950188 (1999-09-01), Wildermuth
patent: 5970495 (1999-10-01), Baru et al.
patent: 6223171 (2001-04-01), Chaudhuri et al.
patent: 6272487 (2001-08-01), Beavin et al.
patent: 6477523 (2002-11-01), Chiang
Oracle Corporation, “SQL Language: Reference Manual, Version 6.0,” pp. 5-1 to 5-5, 5-96 to 5-97 (1990).
Eugene Wong et al., ACM Transactions on Database Systems, vol. 1, No. 3, “Decomposition-A Strategy for Query Processing,” pp. 223-241 (Sep. 1976).
P. Griffiths Selinger et al., ACM, “Access Path Selection in a Relational Database Management System,” pp. 23-34 (1979).
Masaru Kitsuregawa et al., Institute of Industrial Science, University of Tokyo, “Query Execution for Large Relations on Functional Disk System,” pp. 159-167 (1989).
U.S. patent application Ser. No. 09/976,632, filed Oct. 12, 2001.
U.S. patent application Ser. No. 09/977,038, filed Oct. 12, 2001.
U.S. patent application Ser. No. 10/039,283, filed Dec. 31, 2001.
U.S. patent application Ser. No. 09/608,977, filed Jun. 30, 2000.
D.D. Chamberlin et al., “Views, Authorization, and Locking in a Relational Data Base System,” National Computer Conference, pp. 425-430 (1975).
D.D. Chamberlin et al., “Sequel 2: A Unified Approach to Data Definition, Manipulation, and Control,” IBM Journal of Research and Development, vol. 20, pp. 560-575 (Nov. 1976).
M.W. Blasgen et al., “On The Evaluation of Queries in a Relational Data Base System,” IBM Research Report RJ 1745, pp. 1-44 (Apr. 1976).
M.W. Blasgen et al., “Storage and Access in Relational Data Bases,” IBM Systems Journal, No. 4, pp. 363-377 (1977).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Collecting statistics in a database system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Collecting statistics in a database system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Collecting statistics in a database system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3312024

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.