System and method for relating files in a distributed data...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C715S252000

Reexamination Certificate

active

06615225

ABSTRACT:

BACKGROUND OF THE INVENTION
1. The Field of the Invention
The present invention relates to systems and methods for relating files in a distributed data storage environment. More specifically, the present invention relates to systems and methods for relating groups of files transmitted to a remote storage site using an identifier unique to each group.
1. The Relevant Art
In a data processing system, a backup/restore subsystem, usually referred to as a backup subsystem, is typically used as a means to save a recent copy or version of a file, plus some number of earlier versions of the same file, on some form of backup storage devices such as magnetic disk drives, tapes, or optical storage devices. The backup subsystem is used as a means of protecting against loss of data in a given data processing system. For example, if an on-line version of a file is destroyed or corrupted because of power failure, hardware or software error, user error, or some other type of problem, the latest version of that file which is stored in a backup subsystem can be restored and therefore the risk of loss of data is minimized. Another important use of backup subsystems is that even if failures do not occur, but files or data are deleted or changed (either accidentally or intentionally), those files or data can be restored to their earlier state thus minimizing the loss of data.
A closely related concept to the backup subsystem is an archive/retrieve system, usually referred to as an archive subsystem. Archiving refers to making copies of files on lower cost storage such as tape so that the files can be deleted from more expensive technology such as disk storage. Since disk storage is frequently being updated, an archival copy also helps in preserving the state of a collection of data at a particular point in time.
Although the improved method of carrying out the backup disclosed in this application is primarily described for a backup system, it will be obvious to the person of ordinary skill in the art of data processing that the systems and methods described herein are also applicable to archive systems or other related data storage and storage management systems.
At the present time, the majority of backup systems run on host systems located in a data processing environment. Typically, a new version (also referred to as changed version) of a file is backed up based on a predetermined schedule such as, at the end of each day, or after each time that a file has been updated and saved.
Backup systems generally consume large amounts of storage media, because multiple versions of large amounts of data are being backed up on a regular basis. The transmission of the large amounts of data that prior art backup systems necessarily store also consume large amounts of network bandwidth. Therefore, those engaged in the field of data processing and especially in the field of backup/restore systems are continuously striving to find improved methods and systems to reduce the storage demand in backup systems. Previously, a full backup was conducted for each file in a system. More recently, an incremental backup method has been employed to enable the storage of and retrieval of multiple versions of a given file while consuming less storage space.
The full backup method is the most basic method used and requires the back up of an entire collection of files, or a file system, regardless of whether individual files in that collection have been updated or not. Furthermore, in the full backup method, multiple full versions of each file are maintained on a storage device. Since maintaining multiple full copies of many files consumes substantial amount of storage, compression techniques are sometimes used to reduce the amount of data stored. Compression techniques basically rely on the presence of redundancy within the file, so called intra-file redundancy, in order to achieve this reduction. The most common method is the use of a method of file compression known as Lempel-Ziv method (also known as Adaptive Dictionary Encoder or LZ coding) described in a book by T. C. Bell et. al, titled Text Compression, pp 206-235. The essence of Lempel-Ziv coding is that redundant phrases are replaced with an alias, thereby saving the storage space associated with multiple occurrences of any given phrase. This is a general method which can be applied to any file and typically results in compression ratios of the order of between 2 and 3.
Incremental backup is an alternative to full backup. In systems using incremental backup, backups are performed only for those files which have been modified since the previous incremental or full backup.
In any given backup system, the higher the backup frequency, the more accurately the backup copy will represent the present state of data within a file. Considering the large volume of data maintained and continuously generated in a typical data processing system, the amount of storage, time, and other resources associated with backing up data are very substantial. Thus, those skilled in the art are continuously engaged in searching for better alternatives and more storage and time efficient systems and methods for backing up data.
Aside from the compression technique which is heavily utilized to reduce storage requirement in a backup system, there exists a quite different method of achieving reduction in backup file size. This method is known as delta versioning or “differencing.”
Differencing relies on comparisons between two versions of the same file, where multiple versions are saved as a “base file,” together with some number of “sub-files” which represent only the changes to the base file. These small files, also referred to as “delta files” or “difference files,” contain only the changed portions, typically bytes or blocks which have changed from the base file. Delta files are generated as a result of comparing the current version of a file with an earlier version of the same file, referred to as the base file. Differencing thus exploits redundancy between file versions, in order to achieve reductions in storage space and network traffic.
Substantial storage savings in backup systems may result from the adoption of differencing techniques, since frequently the selection of a file for incremental backup occurs after a small change has been made to that file. Therefore, since many versions of a file that differ only slightly from one another may be backed up, differencing offers great potential for substantial reductions in the amount of data that must be transferred to and stored in the backup server.
Recently, the emergence of low cost local area networking, personal computer, and workstation technology has promoted a new type of data processing architecture known as the “client-server” system or environment. A client-server system
10
, as shown in
FIG. 1
, typically consists of a plurality of client computers (also referred to as clients)
11
, such as personal computers or workstations. The client computers
11
are preferably provided with a local storage medium
12
such as a disk storage device. The client computers
11
communicate over a network
13
, such as an Ethernet or a Token Ring, which links the clients
11
to one or more network server computers
14
.
The server computer
14
is generally a mainframe computer, a workstation, or other high end computer and is typically provided with one or more local storage mediums
15
such as a disk storage device, a tape storage device, and/or an optical storage device. The server computer
14
usually contains various programs or data which is shared by or otherwise accessible to the clients
11
. Such a client-server system comnmunicating over a network is often referred to as a “distributed” system or network.
The distributed client-server environment presents a number of major issues related to data processing, integrity, and backup of such data. One major concern in the client-server environment is that a substantial amount of critical data may be located on client subsystems which lack the security, reliability or care of administr

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for relating files in a distributed data... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for relating files in a distributed data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for relating files in a distributed data... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3111390

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.