Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-07-24
2002-04-16
Alam, Hosain T. (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C711S162000
Reexamination Certificate
active
06374266
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to a method for storing information in a data processing system and, in particular, to a method for compressing and storing information in a data processing system.
BACKGROUND OF THE INVENTION
A recurring problem in computer based data processing systems is the storing of information, such as data files, application programs and operating system programs, particularly as the size and number of program and data files continues to increase. This problem occurs in single user/single processor systems as well as in multi-user/multi-processor systems and in multi-processor networked systems and may occur, for example, in the normal operation of a system when the volume of data and programs to be stored in the system exceeds the storage capacity of the system. The problem occurs more commonly, however, in generating and storing “backup” or archival copies of a system's program and data files. That is, the backup copies are typically stored in either a portion of the system's storage space or in a separate backup storage medium, either of which may, for practical considerations, have a storage capacity smaller than that of the system, so that the volume of information to be stored may exceed the capacity of the backup storage space. Again, this problem occurs commonly in single user systems, and is even more severe in multi-user/multi-processors systems and in networked systems because of the volume of data generated by multiple users and because such systems typically contain multiple copies of application programs and operating system programs, which are frequently very large.
The problem may be alleviated by the use of “chapterized” backup systems which make periodic copies of all data files, and often the program files, on a system such that the exact state of the system at any given time can be regenerated from the appropriate backup chapter. In this method, therefore, and while a file that has been deleted from the system will not appear in subsequent backup chapters, files having different names but identical contents will apparently appear more than once in the underlying data.
Traditional methods for storing information, and in particular for storing backup or archival copies of data and program files have offered little relief for this problem. For example, the sector copy method for making backup copies of files on disk drives merely copies the contents of a disk drive, sector by sector, into another storage medium, such as another disk drive or a tape drive. This method therefore not only does not reduce the volume of data to be stored, but, because the copying is on the basis of disk drive sectors, does not permit the stored information to be accessed and restored on the basis of files and directories.
The prior art has therefore evolved and offered a number of “data compression” schemes for dealing with this problem by reducing the volume of the data or program files to be stored while retaining the information contained in those files. These schemes have generally used either of two basic classes or groups of data compression methods. The first group of methods, which may be referred to as intra-file methods, searches within individual bodies of streams of data to eliminate or reduce redundant data within each individual file. The second group of methods, which may be referred to as inter-file methods, searches across streams or bodies of data to eliminate or reduce redundancy between files in a system as entities, that is, to eliminate files that are duplicates of one another.
Broadly, the prior art can also be classified as including intra-file methods such as PKZIP, ARC, and LHZ, inter-file methods based on file and directory names such as TAPEDISK's TAPEDISK® system, and inter-file methods based on file content such as the STAC, Inc. REPLICA® system.
The intra-file methods, of which there are many variations, recognize that the form in which data is expressed in a file typically uses more information bits than are actually required to distinguish between one element of data and another, and that the data can be reduced in volume by an encoding method that reduces the proportion of unnecessary or redundant data bits. For example, text is frequently expressed in ASCII or EBCDIC code, which uses character codes of a uniform size, typically seven or eight bits, to express the different characters or symbols of the text. For example, some text compression methods recognize that certain characters or symbols or combinations or sequences of characters of symbols occur more frequently than others, and assign shorter codes to represent more frequently occurring characters or combinations of characters and use longer codes only for rarer characters or combinations of characters.
Intra-file methods generally make use of a so-called “dictionary”. The dictionary contains a mapping between a short sequence of bits and a long sequence of bits. Upon decompression, for each different short sequence of bits, the short sequence is looked up in the dictionary and the corresponding longer sequence of bits is substituted.
Intra-file methods are widely used and are often implemented as computer system utility programs, such as PKZIP, and certain systems, such as certain versions of Microsoft Windows, have included zip-like compression programs as operating system utilities wherein a user may partition a section of a disk drive as an area to read, write and store compressed files. It must be recognized, however, that intra-file methods, such as zip compression, do not address many of the problems of data storage, and are at best only a partial solution to this problem. For example, intra-file methods such as zip compression often provide little compression with files such as graphics files wherein the proportion of redundant bits is much less than in text type files. In addition, intra-file methods of compression inherently depend upon the internal relationships, such as redundancy, between the data elements of a file to compress or reconstruct files. As such, intra-file methods generally cannot detect or reduce redundancy in the data between two or more files because the size of the dictionary becomes so large as to not be practical to use and are therefore generally limited to operating on files individually, so that these methods cannot detect and eliminate redundancy even between files that are literal duplicates of one another and cannot reduce the number of files to be stored.
The inter-file methods, of which there are again many variations, search for files whose contents are essentially duplicates of one another and replaces duplicate copies of a file with references to a single copy of the file that is retained and stored, thereby compressing the information to be stored by eliminating multiple stored copies of files. It will be appreciated, however, that these methods again do not address certain significant problems, and in fact present difficulties arising from their inherent characteristics.
For example, there are two primary methods for identifying duplicate copies of a given file. The first is by examination of external designators, such as file name, version number, creation/modification date and size, and the second is by examination and comparison of the actual contents of the files. Identification of duplicate copies of files by examination of external designators, however, may not identify duplicate copies of files or may misidentify files as duplicates when, in fact, they are not. For example, a given user may rename a file to avoid confusion with another file having a similar name or to make the file easier for that user to remember, so that the file would appear externally to be different from other copies of the file, even though it is a duplicate of the other copies of the file. Also, certain external designators, such as file modification date, are inherently unreliable for at least certain types of files. In the reverse, a user may modify or customize a given file, often referred to as “patchin
Alam Hosain T.
Clapp, Esq. Gary D.
LandOfFree
Method and apparatus for storing information in a data... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for storing information in a data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for storing information in a data... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2908358