Automatic inference of models for statistical code compression

Data processing: artificial intelligence – Fuzzy logic hardware – Fuzzy inference processing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S140000, C341S051000, C382S224000, C706S012000

Reexamination Certificate

active

06516305

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to software development environments, and more particularly to compressing representations of computer executable code
COPYRIGHT NOTICE/PERMISSION
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owners have no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserve all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright© 1999, Microsoft Corporation and the Association for Computing Machinery, Inc., All Rights Reserved.
BACKGROUND OF THE INVENTION
As computer programs have become more complex, the amount of executable code in a typical program has also increased. The increase in the code size results in increased storage requirements of the executable file, and more importantly, increased bandwidth consumption or increased download times when the code is transferred over a network. In addition, many embedded computers have limited amounts of ROM available for program storage. As a result, compression of the executable code is desirable to reduce both storage requirements and network transfer time.
Previous systems have used one or more of several general-purpose data compression systems to reduce the size of executable code. Many general-purpose data compression systems comprise a statistical modeler followed by a coder. As the input is compressed or decompressed, the modeler tracks some context in the input, and associates with each context a probability distribution that the coder (e.g., an arithmetic coder) uses to encode the next token in the input stream. For example, when compressing English text, the letter Q is often followed by the letter U, so a good modeler responds to a Q by switching to a frequency distribution that assigns a high probability to a U and thus encodes it in less space.
Markov models use a number of immediately preceding tokens to help predict and compress the next token. For example, an order-1 model uses the immediately preceding token, an order-2 model uses the 2 immediately preceding tokens and so on. For an alphabet A, an order-N model can use up to |A|
N
probability distributions, one for each combination of the last N tokens. Thus, for an alphabet comprising 256 possible values, an order-1 Markov modeler would use 256 probability distributions, and order-2 modeler would use 65,536 probability distributions etc.
Prediction by Partial Matching (PPM) modelers blend or switch on the fly between several Markov models, preferring more history when the recent context has been seen often, and backing off to use less history when it has less experience with the current context.
In each case, the modeler's objective is to assign a non-zero probability to every valid message (sequence of tokens), and high probabilities to messages that resemble those in some representative training set. The higher the probability assigned to a message M comprising tokens m
1
m
2
. . . m
N
, the shorter its minimum code-length, or entropy.
Code-specific compression mechanisms have been used in addition to the general-purpose compression systems described above. In one example of such code-specific compression, the code produced by compilers is reviewed, either manually or programmatically, for instruction combinations that appear frequently. Then special composite operation codes (opcodes) are designed that replace the frequently appearing instruction combinations. A problem with such an approach is that only set patterns appearing in the code will typically be discovered, while other context that can supply useful information is ignored.
While general-purpose data compression systems can successfully compress compiler generated code, there is a need in the art for systems and methods that can take advantage of the characteristics of compiler generated code to compress such code. In addition, there is a need for such a system that automatically discovers context that can be used to compress the code further than is possible with either general-purpose data compression systems or with the current code compression systems.
SUMMARY OF THE INVENTION
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
In one system for inferring frequency distribution models for compressing data, the system reads a set of training data. In one aspect of the system, the training data can comprise IR code, however code for virtual or real machines could also be used as training data. Tokens are read from the training data. For each token, certain context is saved. The saved context comprises predictors that can be used to predict the token. The predictors include Markov predictors, computed predictors, and reduced predictors.
In a further aspect of the system, the set of token and predictor values read from the training set is presented to a machine-learning component that applies machine-learning algorithms that create a decision tree. The branch nodes of the decision tree comprise conditions that test the predictor values, while the leaf nodes comprise frequency distribution models that vary depending on the conditions in the paths from the root leading to the leaf nodes. The decision tree created using the system can be input to a modeler component of a code compression system.


REFERENCES:
patent: 5379355 (1995-01-01), Allen
patent: 5457643 (1995-10-01), Vahey et al.
patent: 5525982 (1996-06-01), Cheng et al.
patent: 5583500 (1996-12-01), Allen et al.
patent: 5794197 (1998-08-01), Alleva et al.
patent: 6163811 (2000-12-01), Porter
patent: 6225922 (2001-05-01), Norton
patent: 6404923 (2002-06-01), Chaddha
Riskin et al., Lookahead in Growing Tree-Structured Vector Quantizers, International Conference on Acoustics, Speech and Signal Processing, Apr. 1991, vol. 4, pp. 2289-2292.*
Neilsen et al.; “A DAG-Based Algorithm for Distributed Mutual Exclusion”. 11thInternational Conference on Distributed Computing Systems, May 1991, pp. 354-360.*
Ernst et al.; “Code Compression”. Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 1997, pp. 358-365.*
“Automatic Inference of Models for Statistical Code Compression,” Christopher W. Fraser, Apr. 1999, Technical Report MSR-TR-99-20, to appear at ACM PLDI '99. Copyright ©1999 by the Association for Computing Machinery, Inc.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Automatic inference of models for statistical code compression does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Automatic inference of models for statistical code compression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic inference of models for statistical code compression will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3119660

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.