Method and system for automatically merging files into a...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06389433

ABSTRACT:

TECHNICAL FIELD
The invention relates generally to computer systems and data storage, and more particularly to identifying and merging files of a file system having common properties.
BACKGROUND OF THE INVENTION
The contents of a file of a file system may be identical to the contents stored in one or more other files. While some file duplication tends to occur on even an individual user's personal computer, duplication is particularly prevalent on networks set up with a server that centrally stores the contents of multiple personal computers. For example, with a remote boot facility on a computer network, each user boots from that user's private directory on a file server. Each private directory thus ordinarily includes a number of files that are identical to files on other users' directories. As can be readily appreciated, storing the private directories on traditional file systems consumes a great deal of disk and server file buffer cache space.
Techniques that have been used to reduce the amount of used storage space include linked-file or shared memory techniques, essentially storing the data only once. However, when these techniques are used in a file system, the files are not treated as logically separate files. For example, if one user makes a change to a linked-file, or if the contents of the shared memory change, every other user linked to that file sees the change. This is a significant drawback in a dynamic environment where files do change, even if not very frequently. For example, in many enterprises, different users need to maintain different versions of files at different times, including traditionally read-only files such as applications. As a result, linked-file techniques would work well for files that are strictly read-only, but these techniques fail to provide the flexibility needed in a dynamic environment.
Another problem with these techniques is that identifying identical files becomes a complex task as the number of files on a file system volume increases. For example, a disk drive may store thousands of files, and each time a new file is written to a disk or a file is changed, a potential for file duplication exists. At times a user may know when files are duplicates of one another, and thus can manually request that the file data be shared, however relying on a user to detect such conditions is unpredictable, and for large numbers of files, inefficient and/or impractical. One possible solution is to run a utility at system start-up that scans a file system's files for duplicates, however this solution becomes unacceptably slow even with only a few thousand documents. Moreover, such a solution would not work well for users who seldom reboot a machine. Indeed, as more and more disk space is consumed, sharing files becomes a more valuable tool for preserving disk space, and thus a real-time solution could reclaim space when most needed. However, scanning even a relatively modest number of files in a file system volume for one or more duplicates, such as each time that a file is closed, consumes a great deal of time and machine resources, and thus is also impractical.
SUMMARY OF THE INVENTION
Briefly, the present invention provides a method and system for automatically identifying common files of a file system and merging those files into a single instance of the data, having one or more logically separate links thereto representing the original files. A groveler facility maintains a database of information about the files on a volume of a file system, the information for each file including a file size and checksum (signature) based on the file contents. The groveler includes a component that periodically acts in the background to scan the USN log, a log that dynamically records file system activity, whereby new or modified files detected in the USN log are queued as work items, each work item representing a file. The entire volume also may be scanned to add work items to the queue, which takes place initially when the queue is created, or when there is a potential problem with the USN log.
The groveler includes another component that periodically removes items from the queue, calculates the signature of the corresponding file contents, and uses the signature and file size to query the database for matching files. The groveler component then compares any matching files with the file corresponding to the work item for an exact duplicate, and if found, calls a single instance store facility to merge the files and create independent links to those files.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:


REFERENCES:
patent: 5410667 (1995-04-01), Belsan et al.
patent: 5706510 (1998-01-01), Burgoon
patent: 5778384 (1998-07-01), Provino et al.
patent: 5778395 (1998-07-01), Whiting et al.
patent: 5907673 (1999-05-01), Hirayama et al.
patent: 5918229 (1999-06-01), Davis et al.
patent: 6185574 (2001-02-01), Howard et al.
patent: 0 774 715 (1997-05-01), None
patent: WO 99/09480 (1999-02-01), None
patent: WO 99/12098 (1999-03-01), None
patent: WO 99/21082 (1999-04-01), None
LaLonde, Ken, “Batch daemon—README”, UNIX Batch Command, University of Toronto, pp. 1-3 (Feb. 27, 1997), ftp://ftp.cs.toronto.edu/pub/batch.tar.z printed Dec. 8, 2000.
Steere et al., “A Feedback-driven Proportion Allocator for Real-Rate Scheduling”,Third Symposium on Operating Systems Design and Implementation (OSDI '99), USENIX Association, pp. 145-158 (1999).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for automatically merging files into a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for automatically merging files into a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for automatically merging files into a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2844775

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.