Coded data generation or conversion – Digital code to digital code converters – To or from bit count codes
Reexamination Certificate
2002-02-19
2003-11-25
Williams, Howard L. (Department: 2819)
Coded data generation or conversion
Digital code to digital code converters
To or from bit count codes
C358S426130
Reexamination Certificate
active
06653954
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention generally relates to the field of data compression, and more particularly relates to a method and system for compressing data that contains a frequent occurrence of data elements that have the same value.
2. Description of Related Art
Data sets that contain various types of data, including data that defines images, executable programs and other data, are often very large. An example of a large data set is the raw data set describing an image, which is sometimes referred to as a raster data set. The size of a raster data set depends on the image size and resolution, as well as the colorspace used to represent the data and number of bits per color plane. The image size and resolution govern the number of pixels in the image, while the colorspace and the number of bits per color plane define the amount of data needed to describe each pixel.
A simple example is a bilevel image (i.e., where the image simply comprises a number of pixels which are each either on or off) which is a letter size image at a common print resolution of 600 dots per inch. Such an image contains over 33 million pixels that will require approximately 4 MBytes for uncompressed storage, given that each byte can describe eight pixels. Color images are commonly described using 8 or more bits per color. Techniques known as CIEL*a*b* and RGB colorspaces are used, respectively, with three color planes each, for device independent archival storage or for on screen image presentation. The CMYK (Cyan, Magenta, Yellow, Black) colorspace, which specifies four colorplanes, is often used for printing. Color images require between 24 and 32 bits per pixel, depending upon the inclusion of independent black color data, if the intensity of each color is represented with 8 bits per pixel. A letter size image with a resolution of 600 dots per inch requires approximately 96 MB of storage for a three-component colorspace and 128 MB for a four-component colorspace.
The size of a dataset impacts not only the storage of the data but also electronic communication of the dataset. The transmission of a dataset defining a highly detailed image that is to be transmitted to a high speed printer is a particularly difficult problem. Printers, which are capable of printing in excess of several hundred pages per minute, require that data for these images reach the printer with comparable speed. Data communication links are typically inadequate for the communication of the raw image data set to such high-speed printers. In order to communicate the image data to the high-speed printer, data compression is typically employed.
Compression algorithms that are used for image compression are able to be broadly classified into two categories, lossless compression and lossy compression. In a lossless algorithm, the decompressed image is an identical copy of the original image. As the name indicates, lossy algorithms introduce some data loss and the decompressed image is slightly different than the original image. The examples of commonly used lossless algorithms are ITU-TSS T6 Group 4 (for bilevel images) and Lempel-Ziv & Welch (LZW) for arbitrary data. The best-known lossy image compression algorithm is part of the Joint Photographic Experts Group (JPEG) standard.
The different types of compression algorithms are appropriate for different image types. Images may be classified into linework and continuous tone images (which may also be referred to as ‘contone’ images for the purposes of this specification). Linework images contain sharp edges and areas of high color contrast. Examples of linework images are rasterized text, pie charts and line drawings. Continuous tone images are distinguished from linework images by constantly varying color and a general lack (or relative unimportance) of sharp edges. Photographs are primary examples of continuous tone images.
Compression of a linework image via a lossy algorithm unacceptably degrades the decompressed image. Examples of unacceptable degradation of a linework image compressed with a lossy algorithm (e.g., the JPEG algorithm) are artifacts and blurring in areas which neighbor sharp edges within the linework image. On the other hand, compressing a continuous tone image via a lossless compression algorithm results in very little data compression (e.g., common reduction in data size is 10% for compression of a continuous tone image via the LZW algorithm). The use of a lossless compression algorithm on a continuous tone image may actually cause the data set to expand (i.e., the compressed data set is larger than the uncompressed data set). Conversely, the lossless data compression algorithms preserve the quality of a linework image and also tend to have good data compression performance. The quality degradation of a decompressed continuous tone image that was compressed via a lossy algorithm is often imperceptible. Image data compression becomes more efficient if the image data is distinguished between continuous tone image data and linework image data and the data is compressed via an algorithm suitable for the type of image. This phenomenon is used in many image data rasterizers that are used in color printing, which process linework and continuous tone differently. While multiple color planes can combine in various ways, the images being carried in the CMYK colorspace (which refers to the four color planes used to encode color data: Cyan, Magenta, Yellow and Black) for printing are almost invariably carried in the planar format, where each color plane of the image is compressed separately.
Linework image data is often encoded using a run length algorithm or a variant of the LZW algorithm. The run length algorithm encodes each scan line in the image separately, by recording the number of pixels that have the same color intensity value. Alternatively, a run end algorithm can be used where the position of the last pixel in a “run” of pixels with the same color is recorded instead of the length of each run. A number of run lengths and run end compression forms are currently used, such as the MRLE runlength format used to communicate linework image data to the Xeikon high speed color printheads. Since the runlength/run end compression format record the changes in color on each scan line, the resulting datasets tend to be quite efficient for linework data.
The runend/runlength format serves as the intermediate format for the MMR family of algorithms, such as ITU-TSS T6 Group4, and is therefore heavily used in processing of bilevel data.
The LZW algorithm compresses an arbitrary stream of data (i.e., its use is not necessarily restricted to image data). The LZW algorithm operates by building a dictionary of code words that each represents a sequence of bytes. The dictionary is implicit, which is to say that the dictionary is never explicitly embedded into the compressed data stream. The dictionary of the LZW algorithm is dynamically constructed by the compressed data stream decoder as the encoded data is processed. The code words of typical LZW encoding techniques are 9-12 bits long. When the dictionary is full (i.e., all of the code words have been used), a special code, i.e., the CLEARCODE, is encoded. Upon receipt of the CLEARCODE by an LZW decoder, the code word dictionary is erased and the algorithm restarts.
In general terms, the encoding process of the LZW algorithm operates by building data strings and maintaining a dictionary of code words to represent data strings that contain previously observed data patterns. These code words are used to replace subsequent occurrences of those data strings. The algorithm maintains a currently active string. The currently active string always has a corresponding code word in the dictionary. When the next character is processed from the un-encoded data input, a new string is considered, which comprises the currently active string with the current input character added to the end. If the dictionary already contains a code word describing the new string, that code word becomes the current string and t
Fleit Kain Gibbons Gutman & Bongini P.L.
Gibbons Jon A.
Reid Scott W.
Williams Howard L.
LandOfFree
System and method for efficient data compression does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for efficient data compression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for efficient data compression will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3163370