Electrical computers and digital processing systems: memory – Storage accessing and control – Access timing
Reexamination Certificate
1999-10-05
2004-11-30
Hudspeth, David (Department: 2651)
Electrical computers and digital processing systems: memory
Storage accessing and control
Access timing
Reexamination Certificate
active
06826668
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to improving the data transfer efficiency of a magnetic disk or other form of data storage in which sequential access is faster than random access.
2. Description of the Related Art
The speed with which a magnetic disk drive transfers data depends on the time consumed in positioning the head of the drive over the appropriate location on the disk and then waiting for the appropriate section of the disk to rotate under the head. This is referred to as “access time”. Furthermore, the speed with which a magnetic disk drive transfers data depends on “transfer time”, i.e., the time consumed in actually transferring the data, once the head is properly positioned over the appropriate disk section.
Data is typically requested in large blocks so that transfer time dominates access time, thereby more effectively using the bandwidth of the disk drive. Unfortunately, as recognized by the present invention the constituents of the desired stream of data frequently are not stored in physically adjacent locations because it may not be known apriori that the constituents later will be related by a user and, hence, accessed as a related combination of data. This prolongs access time. To minimize access time and thereby further improve the performance of a disk drive, the present invention understands that it is preferable to group data on the disk such that related constituents are physically close together (the data has “spatial locality”) and, hence, the disk drive reads relatively large blocks of data while minimizing access time.
U.S. Pat. Nos. 5,694,571, 5,394,537, and 4,680,703 cluster pages of data on a disk physically close together on the fly, every time a page is fetched. In other words, in the above-referenced patents data pages are clustered based on when they are first accessed. As recognized herein, such clustering schemes, while simple, result in the excessive movement of pages including during peak disk drive workloads. Moreover, they impose an overly simplistic reference pattern on the storage scheme that might not accurately reflect subsequent actual usage patterns. Further, a page of data might be appropriately included in more than a single group of pages, a recognition of the present invention that is not accounted for in the above-referenced patents. As recognized by the present invention, an improved data page clustering scheme can be provided that more accurately reflects actual usage patterns and, hence, that more robustly and effectively improves disk drive performance than existing clustering systems.
SUMMARY OF THE INVENTION
A storage device controller is disclosed that includes a data storage device having a program of instructions that is readable by the controller for clustering pages based on user access patterns over time that establish referential relationships between pages. The program of instructions embodies method steps that include representing data pages as respective vertices in a graph G. The edges define respective weights that represent the likelihood that the respective pages will be accessed contemporaneously. Additionally, the method steps embodied by the program of instructions further include defining an analysis graph C having a vertex u, and, based at least on the weights of the edges, adding vertices and edges to the analysis graph C. At least portions of the analysis graph C are output as being representative of a cluster of related pages.
In a preferred embodiment, the logic undertaken by the controller further includes storing related pages physically close together on a data storage medium, using at least the portions of the analysis graph C. Also, the preferred logic includes topologically sorting the analysis graph C, prior to outputting the cluster. Moreover, the logic includes storing at least a first page physically close to either at least a first cluster of pages related to the first page, and/or at least a second cluster of pages related to the first page. If desired, the logic can include storing the first page with the first cluster and with the second cluster in a reorganized area of the storage medium while leaving the first page stored in an original location on the storage medium.
As set forth in greater detail below with respect to the preferred implementation, the vertex u of the analysis graph C is a vertex associated with a heaviest weight edge in the graph G. In this embodiment, the logic includes creating a vertex w in the graph G, with the vertex w representing the analysis graph C, and then duplicating edges (IN(u),u) as (IN(u),w) and duplicating edges (u,OUT(u)) as (w,OUT(u)). A heaviest edge originating or ending in the vertex w and having an associated second vertex not present in the analysis graph C is identified, and the associated second vertex is added to the analysis graph C. For each vertex x in the analysis graph C, edges between the vertex x and the associated second vertex are removed from the graph G and added to the analysis graph C, prior to sorting the vertices (pages).
In another aspect, a computer-implemented method is disclosed for clustering related data pages in close physical proximity on a data storage medium. The method includes identifying relationships between the pages, with the relationships being established by more than time of page access. Also, the method includes clustering the pages based on the relationships.
In still another aspect, a computer program device includes a computer program storage device readable by a digital processing apparatus and a program means on the program storage device. The program means includes instructions that are executable by the digital processing apparatus for clustering related pages on a data storage medium. The logic by which this is undertaken includes identifying pages and relationships between the pages, with the relationships being established based on user accesses of the pages. The relationships are characterized by respective weights established at least in part based on user accesses of the pages. Based on at least the weights, pages are associated in clusters of related pages.
The details of the present invention, both as to its structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which:
REFERENCES:
patent: 4680703 (1987-07-01), Kriz
patent: 4956803 (1990-09-01), Tayler et al.
patent: 5297258 (1994-03-01), Hale et al.
patent: 5394537 (1995-02-01), Courts et al.
patent: 5649153 (1997-07-01), McNutt et al.
patent: 5682500 (1997-10-01), Vishlitzky et al.
patent: 5694571 (1997-12-01), Fuller
patent: 6065104 (2000-05-01), Tng
patent: 6088758 (2000-07-01), Kaufman et al.
patent: 6112285 (2000-08-01), Ganapathy et al.
patent: 6119214 (2000-09-01), Dirks
patent: 6131147 (2000-10-01), Takagi
patent: 6138226 (2000-10-01), Yoshioka et al.
Hsu Windsor Wee Sun
Ong Shauchi
Young Honesty Cheng
Hudspeth David
Rogitz John L.
Tzeng Fred F.
LandOfFree
System and method for reorganizing data on a disk drive to... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for reorganizing data on a disk drive to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for reorganizing data on a disk drive to... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3280222