System and method for performing lossless data compression...

Coded data generation or conversion – Digital code to digital code converters – To or from packed format

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details System and method for performing lossless data compression... System and method for performing lossless data compression...

: 2000-11-14
: 2002-06-04
: Jeanpierre, Peguy (Department: 2819)
: Coded data generation or conversion
: Digital code to digital code converters
: To or from packed format

: Reexamination Certificate
: active
: 06400289
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an improved system and method for performing lossless data compression and decompression. More particularly, the present invention relates to a system and method for performing lossless data compression and decompression which employs a trie-type data structure to efficiently parse the data string being compressed, while also taking into account any pre-defined grammar and pre-defined source statistics relating to the data in the data string, as well as error handling at the decoder and memory constraints for both the encoder and decoder.
2. Description of the Related Art
Lossless data compression algorithms can be broadly classified into two types, namely, dictionary coding and statistical coding. The most widely used dictionary coding algorithms are the Lempel-Ziv algorithms and their variants. Dictionary coding techniques achieve compression by exploiting redundancy in data through some kind of string matching mechanism. In contrast, statistical coding methods such as arithmetic coding exploit the redundancy in data through a statistical model. Very recently, a new type of lossless source code called a grammar-based code was proposed in a publication by J. C. Kieffer and E. -H. Yang entitled “Grammar Based Codes: A New Class of Universal Lossless Source Codes,” IEEE Transactions on Information Theory, the entire contents of which is incorporated herein by reference.
The class of grammar-based codes is broad enough to include Lempel-Ziv types of codes as special cases. To compress a data sequence, a grammar-based code first transforms the data sequence into a context-free grammar, from which the original data sequence can be fully reconstructed by performing parallel substitutions. This context-free grammar, which is a compact representation of original data, is compressed using arithmetic encoding. It has been proved in the Kieffer publication referenced above that if a grammar-based code transforms each data sequence into an irreducible context-free grammar, then the grammar-based code is universal for the class of stationary, ergodic sources. Grammar-based codes offer a design framework in which arithmetic coding and string matching capability can be combined in an elegant manner.
Within the framework of grammar-based codes, an efficient greedy grammar transform was developed as described in a publication by E. -H. Yang and J. C. Kieffer, entitled “Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform—Part one: Without Context Models”, IEEE Transactions on Information Theory, the entire contents of which is incorporated by reference herein. This greedy grammar transform sequentially constructs a sequence of irreducible context-free grammars from which the original data sequence can be recovered incrementally. The greedy grammar transform sequentially parses the original data into non-overlapping, variable-length phrases, and updates the grammar accordingly.
Based on this greedy grammar transform, three universal lossless data compression algorithms are proposed in the Yang publication cited above, namely, a hierarchical algorithm a sequential algorithm, and an improved sequential algorithm These algorithms jointly optimize, in some sense, string matching and arithmetic coding capability. Although the three algorithms are based on the same grammar transform, they differ in their encoding strategies.
The hierarchical algorithm encodes the final irreducible grammar using a zero order arithmetic code with a dynamic alphabet, while the two sequential algorithms use arithmetic coding to encode the sequence of parsed phrases, instead of the final grammar. The improved sequential algorithm is an improved version of the sequential algorithm that better exploits the structure of the grammar at each step. The improved sequential algorithm, henceforth referred to as the YK algorithm, is of particular interest, since experimental results using this algorithm have yielded superior compression performance compared to the other two algorithms. In addition, unlike the hierarchical algorithm this algorithm is sequential and does not require the whole data to be present before starting the encoding operation. This algorithm has been proved to be universal in the sense that it can asymptotically achieve the entropy rate of any stationary, ergodic source. In addition, the algorithm has been shown to have linear time complexity with data size.
Experimental results using the YK algorithm have shown that the YK algorithm effectively compresses files of sizes ranging from small, such as internet datagrams to big files, such as those occurring in archiving applications. The YK algorithm significantly outperforms Lempel-Ziv type algorithms such as Gzip for both small and large file sizes. In addition, the YK algorithm is more effective than the Burrows-Wheeler transform based algorithm BZIP2, particularly for small files.
The basic data structure used by the YK algorithm is a context-free grammar, whose components are a source alphabet, a set of variables, and production rules that map each variable into a finite string composed of source symbols and variables. A grammar based compression algorithm transforms the original data into such a grammar, with the additional requirement that the grammar be irreducible. A grammar is irreducible if it satisfies a certain definition of compactness. There are several ways to construct an irreducible grammar that represents the original data. The YK algorithm is based on one such grammar transform that sequentially constructs a sequence of irreducible grammars in a greedy manner.
At an intermediate stage of the YK algorithm, there exists an irreducible grammar, defined by the source alphabet, the current set of variables, and the corresponding production rules. Also, there exists a frequency distribution on the source alphabet and the variables. The basic implementation of the YK encoding algorithm consists of a sequentially iterative application of three fundamental steps, namely, parsing, updating and encoding. The parsing operation determines the longest prefix of the remaining part of the original data string that is represented by one of the current variables. The updating operation subsequently updates the grammar after adding the new parsed substring, and modifies the frequency distribution of the source alphabet and/or the variables. The encoding operation uses the frequency distribution on the symbols and arithmetic encoding to code the parsed substring. The decoder sequentially uses arithmetic decoding to determine the parsed substring, followed by updating, to produce an identical sequence of grammars as in the encoder, from which the original data is recovered incrementally.
The following is a theoretical description of the YK algorithm. In defining the variables used in the following description, let
A
be the source alphabet with cardinality |
A
| greater than or equal to 2, let
A
+
denote the set of all finite strings drawn from
A
, let x=x
1
x
2
. . . x
n
be a finite string drawn for the alphabet
A
that is to be compressed, and let
S
={s
0
, s
1
, s
2
, . . . } be a countable set, disjoint from
A
. For j≧1, let
S
(j)={s
0
, s
1
, . . . s
j−1
} and S(j)
+
={s
1
, s
2
, . . . s
j−1
}. A context-free grammar
G
is a mapping from
S
(j) to
S
(j)∪
A
)
+
for some j≧1. The mapping
G
is explicitly represented by writing each relationship (
s
i
,
G
(
s
i
)) as
s
i
→
G
(
s
i
), for i<j.
G
(
s
) is called as the production rule for
s
. The symbol
s
o
is special in the sense that the
A
-string represented by s
o
is the original data string. s
o
is referred to as the start symbol, while s
i
for i<1 are called variables. An example of a grammar is shown below:
s
0
→s
3
s
1
s
1
bbs
2
bs
4
s
4
c
s
1
→s
2
a
s
2
→ac
s
3
→ab
s
4
→s
3
s
2
c
This is an example of

Affiliated with

Banerji Ashish

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Hughes Electronics Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Jeanpierre Peguy

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Lauture Joseph

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sales Michael W.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Whelan John T.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for performing lossless data compression... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for performing lossless data compression..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for performing lossless data compression... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2936609

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure