Electrical computers and digital processing systems: memory – Storage accessing and control
Reexamination Certificate
1998-02-17
2001-02-13
Yoo, Do Hyun (Department: 2759)
Electrical computers and digital processing systems: memory
Storage accessing and control
C711S154000, C710S120000, 36, C707S793000
Reexamination Certificate
active
06189069
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to computer programming, and more particularly, to the optimized logging of data elements to a data storage device.
BACKGROUND OF THE INVENTION
Users are demanding increased performance of their applications running on their computers. Computer hardware, including central processing units (CPUs), are becoming increasingly faster. However, their performance is limited by the speed at which data is available to be processed. In a typical computer, Level 1 (L1) and Level 2 (L2) cache memories are physically close to a processor to provide data at very high rates. The cache memory is typically divided into 32 byte cache lines, a cache line being the common unit of data retrieved from memory. When the required data is not available in L1 cache, a cache line fault occurs and the data must be loaded from lower speed L2 cache memory, or relatively slow RAM. The application is often effectively stalled during the loading of this data, and until such time as the data is available to the CPU. Therefore, by decreasing the number of cache faults, an application will run faster. Furthermore, data elements within an application are not randomly accessed. Rather, data elements, especially within the same structure, union, or class are typically accessed within a short period of other data elements within the same structure, union, or class.
The first step in optimizing an application is to model the usage patterns of data elements by the application. To accomplish this, the application being optimized is executed and used in a typical manner, with data being recorded that tracks the order in which the data elements are accessed. In doing so, a stream of data at a rate of 30-40 megabytes per second on typical hardware is generated. Traditional disk writing methods cannot keep up with this volume of data. Hence, if all of this data is to be collected to disk, either the disk logging process must also be optimized, or the execution of the application must be slowed down or modified which degrades the accuracy of the data usage model. Therefore, the preferred approach is to optimize the data logging such that the application being modeled is not hindered by the data logging method used.
Traditional data logging methods operate in a linear fashion by generating a first record of data, writing the first record of data to disk, generating a second record of data, writing the second record of data to disk, and so on. While this approach is simplistic, this method does not optimize the writing of a large amount of data to disk, such as the voluminous data stream generated when modeling the application. In fact, the processing overhead is so high that the linear data writing approach does not allow data to be written at the fastest rate allowed by the hardware. Rather, the data logging rate is limited by software processing of individual write operations. Such is the same problem with reading in records of data, one record at a time. Needed is a solution for writing and reading of a large amount of linear, order dependent data at high rates of speed which approach the physical limitations of the hardware device to which the data is being logged.
SUMMARY OF THE INVENTION
According to the invention, cache line groupings of data elements within a single structure, union, or class of an application are determined for minimizing the frequency of cache line faults. As used herein, the meaning of the terms “structures” or “structure” include structures, unions, classes, or the like. Furthermore, this disclosure describes the invention in terms of optimizing the performance of a computer “application”. However, the invention disclosed herein is equally applicable to any computer program, including end-user applications, operating systems, objects, drivers, operating environments, library routines or objects, and the like.
Data is first collected describing the application's usage of data elements within each structure. Next, this data is manipulated to determine statistical correlations among accesses to data elements within the same structure. Optimized cache line groupings of data elements are then determined in order to maximize the probability that for any data element accessed within a structure, the next or previously accessed element will be within the same cache line. Using this optimized grouping, the source code of the application is edited to re-order the declaration statements such that the rebuilt application will have optimized groups of data elements assigned to cache lines. In this manner, the application will generate fewer cache line faults, and run more efficiently.
More specifically, an application is executed and used in a manner characteristic of a typical use of the application with accesses to the data elements being recorded. To generate this data, the source code of the application is first compiled with instrumentation being added so that a data record is produced each time a data element is accessed. Then, when the application is subsequently executed, the application generates a disk file containing a sequential stream of data records containing an entry for each time a data element is accessed and the type of access (a read or write operation). Thus, two data elements accessed one after the other will have corresponding sequential entries within the stream of data. Alternatively, a background process could be used to track the accessing of data elements.
To keep up with the 30-40 megabytes of data produced per second on typical current hardware, an optimal data logging process is used to efficiently write the stream of data to disk at a rate which approximates the maximum rate that the disk hardware can support, which cannot be achieved using traditional data writing methods. To overcome the processing limitations on the transfer rate of traditional data logging methods, the performance of a write operation is divided between data source and data logger software processes which operate in different threads, these threads being in the same process or in different processes. The data source first retrieves a buffer from a pool of empty buffers; each buffer containing references to blocks of contiguous memory addresses.
Further efficiency is gained when the size of each block of memory addresses corresponds to the file allocation size (or a multiple thereof) for the hardware device (e.g., a disk) being employed. Normally, when a file is expanded on a hardware device, the additional space is allocated and the contents of this additional space on the disk must be erased for security reasons (i.e., to protect the prior user data stored in this newly allocated space). However, by combining (1) a file allocation request and (2) a data storage request for writing to the entire contents of the additional space requested into a single operation, the data erasing step is unnecessary and can be eliminated to increase data logging efficiency. In an embodiment of the present invention, each block of memory addresses corresponds to a page of memory which also corresponds to a file allocation size for the hardware device. A typical block of memory addresses used in an embodiment of the present invention is four (4) Kbytes in size, and will vary depending on the hardware platform on which the present invention is being practiced.
The data source then fills this memory block with data records, and when full, the buffer is placed at the end of a queue of full buffers to be written to the hardware device.
When there are buffers in the queue of full buffers, the data logger, operating asynchronously with respect to the data source, consolidates (i.e., packages) the full buffers into larger data blocks. In some instances, these larger data blocks will reference non-contiguous blocks of memory addresses (i.e. the memory addresses corresponding to the buffers comprise a non-contiguous address space). In one embodiment, an array is filled with pointers to the blocks of memory addresses contained in the buffers which were removed from the full buffer queue
Boa Douglas Stewart
Nolte Barry Michael
Parkes Michael Andrew Brian
Leydig , Voit & Mayer, Ltd.
Microsoft Corporation
Moazzami Nasser
Yoo Do Hyun
LandOfFree
Optimized logging of data elements to a data storage device does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Optimized logging of data elements to a data storage device, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimized logging of data elements to a data storage device will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2576624