File system control method, parallel file system and program...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C709S200000, C709S201000

Reexamination Certificate

active

06385624

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a file control method for a file system (parallel file system) adapted for a parallel program running on a parallel computer system in which multiple computers are interconnected by a high-speed network, a parallel file system, and a program storage medium for implementing that parallel file system.
2. Description of the Related Art
As a technique of increasing the speed with which a file on I/O node computers (called server nodes) is accessed by multiple computers on which a user program runs (called client nodes) over a network, the client cache system is well known in which a cache is installed on the client side to minimize the amount of data transferred between the client and the server.
However, the client cache system has a problem that, in the environment in which the so-called “write share” in which multiple nodes update a shared file concurrently is generally used, the overhead for acquisition control of right of access to the client cache and processing of making cache data invalid increase.
To solve this problem, the client cacheless system is frequently used which permits communication with the server node each time a read/write request is made by a user program. However, this system has a drawback that, when read/write data used by the user program is small in length, the communication overhead increases sharply and moreover the I/O node striping effectiveness is lost.
As a technique of allowing the data length of read/write request used by the user program to be large in the client cacheless environment, a stride access interface has been proposed. The stride access interface services access to more than one portion of a file in a single read/write request by declaring a discrete sequential access pattern for the file. For example, when the user program desires to make access to discrete data in a certain file, such as 200 bytes of data from the 1000th byte, 200 bytes of data from the 2000th byte, and 200 bytes of data from the 3000th byte, the stride access interface services that access in a single read/write request by declaring a pattern in which data to be accessed are placed.
As compared with the case where read/write requests are issued individually, the stride access interface provides optimization of a file system and higher utilization of the network.
A disk storage unit (hereinafter referred to as a disk), which is an important ingredient of a parallel file system, can exhibit the highest performance when accessed sequentially. When accessed randomly, on the other hand, the disk suffers from considerable degradation in performance due to the seek time and rotational latency time. As a technique of evaluating stride access interface requests issued by two or more client nodes and declared to be related to each other and converting disk accesses to a sequential one by taking advantage of such characteristics of the disk, the collective I/O system is known.
In the collective I/O system, the server node schedules disk accesses and data transfers and handles related access requests issued by all related client nodes collectively to carry out input/output (I/O) operations, thereby minimizing the number of disk accesses and the time required for data transfers.
Conventional parallel programs for write share of a file contain logic to assure data consistency without fail.
FIG. 1
shows a general operation of a parallel program adapted for write share of a file. Process
2
A (process
1
) and process
2
B (process
2
) are sub-programs that make up a parallel program and run on different compute nodes
1
A and
1
B. Process
2
A sends a message to process
2
B on the other compute node
1
B to thereby make notification that a file
8
has been ready for processing.
That is, the parallel program for write share should contain a process (notify node B) of notifying the parallel program running on the other compute node of a file having been updated and does not rely on only timing-dependent sequential consistency.
SUMMARY OF THE INVENTION
The stride access interface and the collective I/O techniques are useful in significantly improving the input/output operations of a parallel program. However, they have a drawback of requiring considerable amendments to an existing program because they greatly differ from an existing file access interface, for example, the UNIX system, which supposes a parallel program, and a user program needs a great large of I/O buffers.
The client cache system, although having advantages of the capability of servicing small size read/write requests in an efficient manner and moreover permitting an existing program to be used without being amended, has a drawback of requiring a significant overhead for keeping cache consistency.
It is therefore an object of the present invention to improve the performance of a parallel file system without the need of considerable amendment to an existing program.
According to an aspect of the present invention, in a network file system in which multiple compute nodes share files over a network, each compute node comprises a file update notification facility for, when a file update is made by a compute node, notifying other compute nodes of the file update, and each compute node stores read data or write data for the file in a buffer in the compute node. A program that runs on a node calls the file update notification facility when consistency for file update data is needed, and the file update notification facility invalidates the data corresponding to the file update data that each compute node stores in its buffer.
The buffering in each node is performed only for data that has been actually written. Unlike cache control, therefore, there is no need of exclusive control for preventing multiple nodes from simultaneously updating the same cache line, allowing multiple nodes to operate in parallel. Since only modified portions are held, a file will not be destroyed even if an I/O node merges two or more write requests made by the compute nodes. Thus, the concurrent updating of a file by two or more nodes can be made fast without destroying the file.
In particular, an existing parallel program which carries out write share of a file simply declares the inventive control to be put into effect at the time of opening a file and adds a statement to call (propagate) the file update notification facility to statements of the program that notify the other nodes that a file update has been performed. That is, minimal modifications to the existing parallel program can improve its performance.
According to another aspect of the present invention, in a network file system in which multiple client nodes share a file striped on multiple server nodes over a network, each client node temporarily stores data for which a write request is issued by a user program into buffers and passes the data on to multiple server nodes collectively at the time when the buffers become full or the buffer contents reach a predetermined amount, and, for a read request issued by the user program, each client node reads-ahead data from the server nodes into the buffers collectively, and, for subsequent read requests, copies data read-ahead into the buffers into a user buffer in the user program.
Even with a user program involving many write requests for data of small length, since data are temporarily stored in the buffers and, when the buffers become full, the data are sent to the multiple server nodes in parallel, the effectiveness of I/O node striping can be displayed fully irrespective of data length. Even in the environment in which read requests for data of small length are frequently made, since data are read-ahead by the amount equal to the size of the buffer into the buffers collectively from all server nodes, throughput proportional to the number of server nodes can be attained irrespective of data length.
Each client node has a buffer for each of the server nodes on which a file is striped. If, when one of the buffers is filled with data for which

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

File system control method, parallel file system and program... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with File system control method, parallel file system and program..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and File system control method, parallel file system and program... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2866016

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.