Electrical computers and digital data processing systems: input/ – Input/output data processing – Peripheral adapting
Reexamination Certificate
1999-03-31
2002-10-15
Gaffin, Jeffrey (Department: 2182)
Electrical computers and digital data processing systems: input/
Input/output data processing
Peripheral adapting
C710S033000, C710S035000, C710S062000, C710S065000, C707S793000
Reexamination Certificate
active
06466999
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to updating files and streams using a patch, and to compression of files and streams.
BACKGROUND AND SUMMARY OF THE INVENTION
Software application programs commonly are subject to frequent revision and modification, such as for bug fixes and to add new features. It is inconvenient to users and expensive to software venders, however, to distribute and install complete new copies of software application programs each time there is a revision. A more effective alternative is to generate a patch file that details various specific deletions and insertions that will update a previous version copy of the software application program to produce a new version copy. Such a patch file generally is much smaller in size than the complete copy of the new version software application program, and can be more easily and less expensively distributed to and installed on the user's computers, such as via a download from the Internet and use of an automated patch application program.
Data compression also is commonly used to facilitate distribution of files where there is no prior version copy available (e.g., distribution of first version software or data files, or an updated version to new users). One broad classification of data compressors are called “dictionary encoders.” (See, e.g., Bell et al., Text Compression 206-243 (1990).) Dictionary encoders encode an input data stream (e.g., a file that is input into the dictionary encoder to be compressed) with respect to some reference data (referred to as the “dictionary”), by matching substrings of the input data stream to strings in the dictionary and encoding the input data stream substrings as references to the matching strings in the dictionary. In one elementary implementation of the dictionary encoder, the dictionary is a compilation in table form of the most frequently occurring strings in the input data stream. The encoded references are simply the index of the matching string in the table. Although the dictionary must typically be included with the resulting compressed data to allow decompression, compression is achieved because the encoded index of a dictionary string is much smaller than its raw or plain text representation.
A well known subclass of dictionary encoder, generally termed LZ encoders (from originating work done by Lempel and Ziv), compresses an input data stream with respect to a preceding portion of the input data stream known as the “history.” The data stream is thus effectively its own “dictionary.” In general, an exemplary LZ encoder searches the history for a string that matches each next portion of the input data stream. If such a match is found, the LZ encoder encodes the matched next portion of the input data stream using a match code that is a reference (e.g., composed of an offset and length) to the matching string in the history. Otherwise, the LZ encoder encodes a next character of the input data stream as a raw data code or “literal” that designates the character as plain text. The just encoded portion of the input data stream is then added to the history, and will be included in the search to match the next portion of the input data stream. Often, the history is stored in a fixed size, sliding window type history store, from which the oldest data exits as new data from the input data stream is added (thus, the data effectively slides through the window). Accordingly, with these prior LZ encoders, an input data stream is encoded with respect to preceding data in that same input data stream. These LZ encoders achieve compression of the input data stream because the match codes can be much smaller than the substrings that they represent.
In Sliger et al., “Method And System For Updating Software With Smaller Patch Files,” U.S. patent application Ser. No. 09/093,591, filed Jun. 8, 1998 (the disclosure of which is incorporated herein by reference) (hereafter referred to as the “Patch Patent Application”), a novel patch system is described that uses a data compressor to generate a patch file for updating a prior version file to a new version file. In one implementation, the history store (which may be the sliding window type) of an LZ encoder is pre-loaded with a prior version of a file. The LZ encoder then commences to compress a new version of the file with respect to the prior version of the file held in the history store. In most cases, the new version of the file will contain much data that is carried over from the prior version. The new version therefore will usually contain many substrings that match substrings of the prior version held in the history store, resulting in highly compressed encoded data. The encoded data is used as a patch file that can be distributed to other computers on which the prior version of the file resides. On such other computers, the new version of the file is reproduced from the patch file using an LZ decoder whose history store also is preloaded with the resident prior version of the file.
A problem common to the above described systems is that the efficacy of both patch generation and compression depends on the degree of similarity between the new and old version files (or between the input data stream and history store). For example, one factor that tends to increase the size of patch files produced by the patch system in the Patch Patent Application is code movement between versions of an executable file. Code movement signifies that some sections or blocks of code within the executable file are moved relative to their locations within the previous version file. Often, there are many instructions (e.g., branch, jump, and function call instructions) in the code of an executable file that reference locations in other code blocks of the executable file as either an offset or absolute location within the file. Such inter-code block references will change if the referenced code block is moved in a new version of the executable file with respect to its location in a prior version. Since program files are a concatenation of many blocks of code, code movement occurs due to the intentional insertion or deletions of data anywhere in the file, and the corresponding changes to location references affect many blocks of code which are otherwise unchanged. When the new version is encoded by the LZ encoder with respect to the previous version, these changed inter-code block references can interrupt matches from code blocks in the new version to otherwise identical code blocks in the prior version of the executable file. As a result, the code block is encoded as several match codes interspersed with literals that encode the changed inter-code block references, instead of as a single match code.
With reference to
FIG. 2
for example, a first code block (A) in an old version of an executable file contains two instructions that reference locations in a second code block (B) by offsets O and P. However, the code block B in a new version of the executable file is shifted relative to its location in the old version, such as may be caused from insertion of new data in between the code blocks A and B. Due to the movement of code block B, the two instructions in code block A of the new version file specify changed offsets O′ and P′ into the code block B. If the old version file is then loaded into the history store of an LZ encoder as described in the Patch Patent Application, the LZ encoder cannot encode the code block A in the new version as a single match code designating the code block A of the old version held in the history store because the offsets differ between the versions effectively interrupting a continuous match. Instead, the LZ encoder at best (assuming no other changes to code block A) encodes the new version's code block A as three match codes (designating corresponding portions of the old version's code block A that are identical) separated by literals that specify the changed offsets O′ and P′ in plain text. By causing the offsets in code block A to change, the movement of code b
McGuire Thomas D.
Shupak Richard M.
Sliger Michael V.
Gaffin Jeffrey
Klarquist & Sparkman, LLP
Microsoft Corporation
Perveen Rehana
LandOfFree
Preprocessing a reference data stream for patch generation... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Preprocessing a reference data stream for patch generation..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Preprocessing a reference data stream for patch generation... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2983172