Characterization of data access using file system

Electrical computers and digital processing systems: support – Digital data processing system initialization or configuration

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C713S100000, C713S002000, C710S010000, C710S104000, C709S220000

Reexamination Certificate

active

06442682

ABSTRACT:

BACKGROUND OF THE INVENTION
The invention relates to a server with an adaptable and configurable file system.
The ever increasing capability of computers in storing and managing information has made them increasingly indispensable in modern businesses. The popularity of these machines has lead in turn to the widespread sharing and communication of data such as electronic mail and documents over one or more computer networks, including local area networks and wide area networks such as the Internet. To support the sharing of data, client-server architectures which support “enterprise” computing typically provide one or more servers which communicate with a number of personal computers, workstations, and other devices such as mass storage subsystems, network printers and interfaces to the public telephony system over the computer networks. The users perform processing in connection with data and programs that may be stored in the network mass storage subsystems through the network attached personal computers and workstations. In such an arrangement, the personal computers/workstations, operating as clients, download the data and programs from the network mass storage subsystems for processing and upload the resulting data to the network mass storage subsystems for storage.
In the server, a file system such as the Unix file system provides services for managing the space of storage media. They provide a logical framework to the users of a computer system for accessing data stored in the storage media. The logical framework usually includes a hierarchy of directory structures to locate a collection of files that contain user-named programs or data. The use of directories and files removes the concern from the users of finding the actual physical locations of the stored information in a storage medium. The logical framework may be stored as “metadata” or control information for the file such as file size and type and pointers to the actual data.
The file system dynamically constructs various data structures in the server's memory, as well as others that are stored with the file system itself on the storage device such as in the memory of attached personal computers and workstations. Typically, the required data structures are loaded from the disk storage device into memory buffer when the file is first accessed (mount time). These structures may be dynamically modified in the memory buffer. When the last access to a file system is made (unmount time), all related data structures remaining in memory buffer are flushed to the various data storage devices.
The access speed of data in the server depends not only from access methodology, but also from data flow in the server. Thus, the way data is physically written or read from disk, the layout of the file system, the size of the caches deployed, the way the pointers to the data blocks is stored, the flush rate of the caches, and the file system paging algorithm affect the efficiency of the server in servicing requests directed at it. If the performance of the server becomes unacceptable, the performance may be improved by changing one or more of the above server parameters. However, conventional systems which attempt to automatically optimize the server parameters do not have a global view of the application and thus may make local optimizations without any knowledge about the environment or the application.
One factor affecting the system performance is the size of the cache. With a limited cache memory, a multitude of requests over a variety of data segments can easily exhaust the capability of the disk cache system to retain the desirable data in the cache memory. Often, data that may be reused in the near future is flushed prematurely to make room in the cache memory for handling new requests, leading to an increase in the number of disk accesses to fill the cache. The increase in disk activity, also known as thrashing, institutes a self-defeating cycle in which feeding the cache with data previously flushed takes a disproportionate impact on the disk drive utilization. A related factor affecting the hit rate is the cache memory block size allocation. An allocation of a relatively large block of memory reduces the quantity of individually allocatable memory blocks. In systems having multiple concurrent tasks and processes that require access to a large number of data files, a reduction in the number of individually allocatable blocks increases the rate of cache block depletion, once more leading to thrashing which decreases the overall disk system throughput. Although additional memory can be added to the disk cache to alleviate the above-mentioned problems, an upper limit exists as to the size of the disk cache that is cost effective.
Another factor affecting the performance of the disk subsystem is the read-ahead policy for prefetching data associated with requests. Prefetching enhances performance when sequential data requests are encountered. However, in the event that the data access occurs in a random manner, the prefetching policy may be ineffective as data brought into the cache is not likely to be used again soon. Additionally, the prefetching policy may cause a bottleneck on the disk data path, as each attempt to prefetch unneeded data consumes valuable data transfer bandwidth in the server. Thus, an automatic prefetch of data in a system with a large percentage of random I/O operations may degrade the overall system performance.
During operation, the server must be capable of concurrently retrieving different data files for different clients, regardless of whether the files are large or small, or that they are actual or meta data, or that they are continuous or non-continuous data files. However, most applications requests data in patterns that are quite predictable. For example, in seismic, weather prediction, or multimedia applications, the data typically is voluminous and is typically not needed immediately afterward. Since the data typically used only once, caching this data often provides little benefit. In another application for serving Web pages, the characteristics of this application are: each Web page is infrequently updated, the data storage size of the Web page is typically small, and the number of accesses or hits for popular Web sites are typically high. During operation, conventional file systems typically bring pages associated with the accessed Web site into memory and serves the Web page(s) associated with the Web site. However, the memory containing the page(s) may be flushed relatively soon to make space for pages(s) associated with another Web site. On the next access of the original Web site, the pages need to be reloaded. In these cases, the automatic optimization may be suboptimal or unnecessary, leading to inefficiencies in such systems.
The access speed of data in servers with, Network Attached Storage (NAS) systems depends not only on the network access methodology, but also on the data flow within the server. Thus, the way the data is physically written or read from the disk, the layout of the file systems and the paging characteristic of the file system affect system performance. Many file systems like Unix File System (UFS), Write Anywhere File System (WAFL), Lazy Write File System (LWFS) may optimize performance using techniques such as pre-allocation of blocks in the case of sequential writes, delayed block allocation in the case of random access, and queuing of disk blocks within streams, among others. However, these systems make certain assumptions about the way the user data is characterized and classify data as sequential, random or meta-data and process data requests in accordance with the assumptions.
SUMMARY OF THE INVENTION
The present invention provides a file system which can be adapted to the characteristics of the access and storage methodology of the user's data. The user can tune the operation of the file system as well as get intelligent information from the file system on his data characteristics. The user is given options in the kernel (which needs system rebo

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Characterization of data access using file system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Characterization of data access using file system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Characterization of data access using file system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2897283

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.