Data processing system and method for generating a...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06411958

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a data processing system and method for generating a structured listing of symbols from which encoded data values for those symbols can be determined.
2. Description of the Prior Art
It is often desirable to compress a stream of data at some point during the processing of that data stream. Generally the data stream can be considered as comprising a sequence of symbols, where a symbol is a predefined data element that can be individually encoded during the compression of the data stream. For example, if the data stream represents a text file, then the data representing each individual ASCII character may be considered to be a symbol. Similarly, if the data stream represents an image, then the data representing individual pixels may be considered to be a symbol.
When compressing such a data stream, it is first necessary to generate a structured listing of the symbols from which encoded data values for those symbols can be determined. This structured listing may take a variety of forms, for example a look-up table identifying the encoded data value for each symbol, or a tree structure with the set of symbols forming the leaves. One example of a compression technique which may be applied is the Huffmann compression technique. When performing Huffmann compression, it is necessary to generate a structured listing of symbols in the form of a Huffmann tree, the Huffmann tree enabling encoded data values to be obtained for each symbol.
Considering the example of Huffmann compression, this basically consists of three steps, namely:
1 Counting symbol frequencies in a data stream;
2 Generating the Huffmann tree based on the frequency information for each symbol; and
3 Encoding symbols using the Huffmann tree.
The first and last steps are well understood, and optimal implementations have existed for some time. However, the second step has a significant impact on the efficiency of the Huffmann compression technique, since the algorithms used to build the Huffmann tree are typically relatively slow. For example, often an algorithm having a complexity of order N
2
(referred to as a O(N
2
) algorithm) is used to build the Huffmann tree, where N is the number of symbols. Hence, as N becomes larger, the time taken to build the Huffmann tree increases proportional to N
2
. More advanced algorithms do exist which use a priority queue in order to achieve lower build time of O(N log N).
However, it is still desirable to further decrease the time taken to generate the structured listing of symbols, for example the Huffmann tree in the case of Huffmann compression. It will be appreciated that better compression will be achieved if the structured listing of symbols is generated dynamically based on the particular data stream to be compressed. However, the benefits of doing this have to be weighed up against the overheads involved in generating the structured listing of symbols dynamically. Hence, in certain implementations where the overhead in generating a dynamic structured listing of symbols is considered too great, a structured listing of symbols is instead generated on the basis of a sample data stream, and compression of subsequent data streams is then obtained using that structured listing of symbols. Clearly if the structured listing of symbols could be generated more efficiently, then this would increase the number of implementations in which it would be acceptable to generate the structured listing dynamically based on the actual data to be compressed, thereby enabling significantly improved compression to be obtained.
Accordingly, it is an object of the present invention to provide an improved technique for generating a structured listing of symbols from which encoded data values for those symbols can be determined.
SUMMARY OF THE INVENTION
Viewed from a first aspect, the present invention provides a method of generating a structured listing of symbols from which encoded data values for those symbols can be determined, the method comprising the steps of: (a) for an input stream of symbols, generating a first list having a plurality of entries, each entry identifying a symbol in the input stream and the frequency with which that symbol appears; (b) ordering the entries in the first list by frequency; (c) selecting the two symbols having the lowest frequency; (d) generating a new symbol to represent the two selected symbols, and allocating the new symbol a frequency based on the two selected symbols; (e) storing the new symbol as an entry in a second list along with an indication of the frequency allocated to the new symbol; (f) arranging for the entries for the two symbols selected at said step (c) to be unavailable for subsequent steps in the generation of the structured listing; and (g) repeating the steps (c) to (f) until only one available entry remains, in each iteration the two symbols selected at said step (c) being chosen from all available entries in the first and the second list.
In accordance with the present invention, a first list is generated having entries identifying symbols and the frequency with which those symbols appear, the entries in the list being ordered by frequency. Then the two symbols having the lowest frequency are selected, and a new symbol is generated to represent those symbols, the new symbol being allocated a frequency based on the two selected symbols. Generally the frequency assigned to the new symbol will be equal to the sum of the frequencies of the two selected symbols represented by the new symbol.
Then, in accordance with the invention, the new symbol is stored as an entry in a second list along with an indication of the frequency allocated to that symbol, and the entries for the two symbols that were selected to be represented by that new symbol are then deemed unavailable for subsequent steps in the generation of the structured listing. This process of selecting the two symbols with the lowest frequency, and replacing them with a new symbol stored in the second list is repeated until only one available entry remains. In each iteration, the two symbols with the lowest frequency are chosen from all available entries in the first and the second list.
It will be appreciated by those skilled in the art that whilst the first and second lists may be provided separately, they could be provided by a single list logically partitioned to provide the first and second lists.
In accordance with the above approach, it will be appreciated that the original set of symbols in the input stream need only be ordered once, since the entries in the first list then remain frequency ordered throughout the generation of the structured listing. There are a number of known O(N) sorting algorithms that can be used to perform this initial ordering. Further, in accordance with the present invention, all of the new symbols generated are placed in a separate list, and since each new symbol generated will have a frequency greater than any of the new symbols previously generated (since in each iteration the two symbols having the lowest frequency will have larger frequencies than the two symbols selected during the previous iteration), then it can be seen that the entries in the second list can be readily maintained in frequency order.
Given that the frequency ordering of both lists is readily maintained, the number of entries that need to be reviewed in order to find the symbol having the lowest frequency is independent of the total number of symbols N, and in fact the smallest symbol can be selected in O(
1
) time, i.e. in a single time step.
If there are N symbols initially, then since each iteration removes two symbols, and inserts one new symbol, the method of the invention takes N iterations to complete. Since selecting the two smallest symbols takes O(
1
) time, and inserting the new symbol takes O(
1
) time, it can be seen that in accordance with the present invention, the structured listing of symbols can be generated in O(N) time, and hence as N becomes larger, the time taken to generate

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data processing system and method for generating a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data processing system and method for generating a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data processing system and method for generating a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2977015

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.