Grouping and duplicate removal method in a database

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S216000

Reexamination Certificate

active

06381601

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a grouping method for discriminating, as a group, plural records of database in which values of particular columns are identical and a duplicate removal method for deleting the overlapping records in a database.
In a dataware house system, a user or an application server issues a query to a database. Particularly, in these years, query has often been issued to a database from application servers such as multiple dimensional analysis server, data mining server and ERP (Enterprise Resource Planning) (business integration package). A database is generally composed of plural records and each record includes the data for a plurality of items. For example, a parts database including information about parts of a certain product is composed of records indicating the data of a certain part and each record includes the data of a plurality of items determined for such part. For instance, such data include size of part, strength, price and supplier, etc. These items are identical not depending on the part, therefore, on the record. Therefore, the database having such structure can generally be considered as the table format data. Each record corresponds to one part and each column corresponds to an item determined for the part. Such table format database will be discussed below. In this case, a database is called as a table or a table data and a certain record as a record data or only as a record in some cases. Aggregation of data corresponding each other in plural records regarding the same item is called as column data or only as a column. In the data warehouse system, a plurality of record data satisfying the conditions indicated by the query from users are retrieved from the database. The process designated by the query regarding a plurality of retrieved record data is further executed and the result of this retrieval is then notified to users. Aggregating process and grouping process are particularly often executed recently among the query of database. The aggregating process is the numerical value computing process to be executed for query result data by obtaining the maximum value, minimum value or average value of particular column data designated by query or obtaining the number of records attained as a result of query in regard to a plurality of record data attained as a result of query. The grouping process for sorting, as one group, plural records in which data of a plurality of pairs of columns (grouping columns) designated by query among plural records attained as a result of query are identical. In the grouping process, aggregating process is conducted for each group. The grouping process is used, for example, for the processes to obtain the average value of sales amount in every factor of age, region and sex. For instance, result of the grouping is composed of a set of the grouping column value and result of aggregation process.
As the grouping method of the related art disclosed in the system based on the merge and sort disclosed in “Parallel Sorting Algorithms,” by Selim G. Akl, Academic Press, Inc. (USA), p. 48 and 49 and the hash system disclosed in “Relational Database Management”, by M. Papazoglou, W. Valder, PRENTICE HALL (USA), p. 262 to 264 can be listed.
FIG. 4
illustrates a flow chart of the grouping by the merge sort system. In this merge sort system, all records are classified so that plural records in which the value of the particular column of plural records forming the database is identical belong to the same group and the records in which value of the column is different from that of the other records belong to the other group. Therefore, in the merge sort system, all records are sorted in regard to the particular column before the grouping. Namely, plural records are first read sequentially from the secondary memory.
Here, the secondary memory is assumed to be structured to perform data reading therefrom or data writing thereto in unit of a constant size called a block. One block generally includes plural records. Therefore, one block including plural records is read into the main memory (
40
). A plural records in the block read out are sorted for the data of any column using a working area on the main memory (
41
) and thereafter such record is written back to the secondary memory as one block (
42
). This process is repeated for a plurality of blocks until all records are read (
43
). After plural records sorted for each block (namely, record list) are formed in the secondary memory as explained above, merge sort is performed between these blocks using the working area in the secondary memory and a list of the sorted records belonging to all blocks is obtained in the secondary memory (
44
). After the process explained above, while the data belonging to the particular column of all records of the sorted record list is sequentially read from the secondary memory (
45
,
47
), the grouping process is performed for the relevant column for these records (
46
). Namely, plural records having the identical value of the column belong to the same grouping and all records having the column values different from that of the other records are classified so that these records belong to the other groups.
In this merge sort system, since the merge sort of all records are performed in the step
44
using the working area in the secondary memory, a large amount of I/O operations are generated for the secondary memory and therefore a longer time is required for the merge sort process. In addition, the merge sort process (step
44
) and grouping process (step
46
) cannot be executed in parallel different from the process to be conducted in the pipeline and therefore access must be done again to the secondary memory for the purpose of grouping process (step
46
) after the access to the secondary memory for the merge sort process (step
44
). Therefore, the processing time which is identical to the total time required for the merge sort process and grouping process is required in this system and the total processing time becomes longer. As a method of solving this problem, a hash system has been introduced.
FIG. 5
illustrates a flow chart of the grouping process using the hash system of the prior art. In the hash system, when one block stored in the secondary memory is read on the main memory (
50
), the grouping process is performed using the hash functions by making use of the working area on the main memory for each record among plural records within the read out block (
51
). The aggregation result of each group held in the aggregating area in the main memory is updated by executing the parallel aggregating processes (
52
).
Namely, each time when the record is newly read into the working area on the main memory, the hash value is obtained for the data belonging to the particular columns among the data in such record and this record is then registered to the hash table prepared in the working area on the main memory. The hash table designates the pointer area and memory area of each record explained below and aggregation of the collecting area and pointer area provided in the successive position of the memory area of each record. Namely, in the hash table, a plurality of pointer areas holding a plurality of pointers corresponding to different hash values is prepared and when any record is newly read to the main memory, the hash value is calculated using the hash function for the data belonging to the particular record used for the grouping process among the data of such record. If the record having the such hash value is not yet detected, the pointer indicating the storing position in the main memory area of such record is stored in the pointer area corresponding to this hash value in this pointer area. As explained above, such record is registered to the group having such hash value. Moreover, such record is assumed as only a member of the group having such hash value and the aggregating process is conducted for such group using the data in such record under the above assumption and thereafter the aggr

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Grouping and duplicate removal method in a database does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Grouping and duplicate removal method in a database, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Grouping and duplicate removal method in a database will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2874464

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.