Data compression method using multiple base number systems

Registers – Records – Particular code pattern

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C235S462100

Reexamination Certificate

active

06196466

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a data compression method that is well-suited to efficient encoding of source data that is a mix of two or more well-defined subsets of the possible message contents, as is commonly found in serial numbers and other messages that are commonly represented in optical codes, such as bar codes, and radio frequency identification (RFID) tags.
BACKGROUND OF THE INVENTION
Prior General-Purpose Compaction Schemes
Many approaches have been devised to efficiently compact a user's source data, based on recognizing redundant information in the source message. Some schemes, such as Huffman coding, work on a character-by-character basis, and define an encoded representation that assigns the fewest bits to the most commonly used characters. However, the message cannot be decoded, unless either the statistical probabilities of the different characters are known a priori to the decoder, or the message contains a “dictionary” that discloses the specific character-to-bit assignments that were used. Neither approach is compatible with bar coding applications or radio frequency identification RFID applications.
Many other schemes, such as those used in file-compression programs like PKZip, inherently include a “dictionary” of compressed strings that is built up as each file is analyzed for compression. However, a very short message (such as a bar-coded serial number) seldom contains enough substring redundancy for this approach to work.
These general-purpose compression schemes are not well-suited to solving the unique problem faced by the bar code industry, which is to encode a relatively short messages (of nearly random character content) in the smallest possible physical space.
Prior Compaction Schemes from Bar Code Symbologies
Efficient compaction of a user's source data has always been a primary goal in the design of bar code symbologies for specific applications. In early bar code symbologies, there was usually a direct correspondence between the application's expected source character set and the bar code's symbol character set (the bar code's predefined set of bar and/or space patterns). For example, the well-known Interleaved-Two-of-Five symbology defines 10 legal patterns, each of five bars or five spaces, and these 10 patterns are directly assigned to the 10 decimal digits which comprise the source character set for the intended application. Another symbology, Code 39, defines 44 legal bar/space patterns in its symbol character set, and these directly correspond to the 44 data characters (digits, capital letters, and some punctuation characters) supported in the source character set. In both of these cases, the bar code symbology inherently supports one fixed subset (numbers, or letters plus numbers) of the possible source character set of 7-bit ASCII characters or 8-bit bytes. In general, the larger the fixed subset of supported data characters, the larger each corresponding bar code character must be. For example, each digit in Interleaved 2-of-5 requires seven unit widths (of the 5 bars or spaces representing each digit, two of the bars or spaces are double-width), whereas each data character in Code 39 requires 12 unit widths (of the 5 bars and 4 spaces representing each character, 3 are double-width).
More recent bar code symbologies, and recent two-dimensional matrix codes, typically have a more flexible correspondence between the source data characters and the resulting bar and space patterns of symbol characters, and usually define support for two or more subsets of the possible 7- or 8-bit source character set. For example, PDF417 is a two-dimensional stacked barcode, which pre-defines several common subsets of the 8-bit source data which can benefit from more efficient encoding. PDF417's symbol characters (bar/space patterns) can, depending upon the values of preceding symbol characters, represent the set of decimal digits, or the set of capital letters, the set of lower-case letters, or the full set of all possible 8-bit values. As another example, Code One is a two-dimensional matrix code, where each black or white cell within its data area represents one bit of encoded data. However, each Code One character (i.e., each group of 8 data bits) does not necessarily directly correspond to a particular group of 8 source data bits. This is because, in addition to a “Byte” code set (which is a full 8-bit representation of each 8-bit source character), Code One defines five different subsets of the source data (ASCII, C40, Decimal, Text, and EDI). Each of these subsets allows source characters to be encoded using fewer than 8 bits each. For example, the Code One Decimal code set can encode a string of three decimal digits from the source message into a single number which can be represented in 10 bits (averaging 3.33 bits per character, rather than the 8 bits per character that would be required if each source byte were directly encoded in the Byte code set). As a second example, Code One also defines a subset of the input values consisting of decimal digits, capital letters, and four punctuation characters. This “EDI” code set encodes three successive source characters (if all are from this predefined 40-character subset) into a Base 40 number which can be represented in 16 bits (averaging 5.33 bits per character). It can be seen from these examples that in general, the larger the predefined subset of source characters, the larger the numerical base of the number that represents them, and thus more bits per character are required to encode them. Similar approaches to efficient encoding of source-message subsets have been designed for most recent bar code and matrix symbologies. The specific choices of subsets may differ, depending on the expected characteristics of source messages, but the encodation mechanisms are quite similar to the examples shown above. Note that, although the average number of bits per character can be a fraction rather than an integer, no current symbology allows the direct representation of a fractional bit. Instead, the encoding algorithm looks for fixed-length groups of source characters (all of which must be members of the same subset), and converts the data values into a fixed number of bits. In general, the larger the group, the more efficient the encoding, but the less likely the source message will contain a homogeneous grouping. For example, a “group” of one digit requires 4 bits to represent it, but a group of two requires only seven bits (averaging 3.5 bits per digit), and a group of three requires only 10 bits (averaging just 3.33 bits per digit). However, this more efficient grouping of three cannot be used if the source message contains runs of only one or 2 digits, separated by letters or other non-members of the digit subset.
In prior symbology designs, the typical approach to this problem has been to include an “escape” character within the predefined subset, so that an occasional non-member source character can be encoded. However, the non-member characters are inevitably encoded with relatively poor efficiency. For example, Code One's C40 code set encodes letters and numbers at an average of 5.5 bits each, and can encode non-member characters, but at an average of 11 bits each (much worse than their source representation in 8 bits).
Finally, these special subset encodation modes usually provide some method of changing to a different pre-defined subset (by making a non-data “latch character” a member of the defined subset). In this way, if the message consists of a string of letters followed by a string of digits, the letters can be encoded using a letter-oriented subset, and then a “code latch” character can be encoded, so that the subsequent source characters can be encoded in a more-efficient digit-oriented code set. If a single number separated two strings of letters, then if the predefined letter subset also includes a non-data “code shift” character, the entire message can be encoded in the letter-oriented subset with fairly good efficiency.
An

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data compression method using multiple base number systems does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data compression method using multiple base number systems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data compression method using multiple base number systems will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2503813

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.