Data encoding scheme

Electrical computers and digital data processing systems: input/ – Input/output data processing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C710S012000, C710S033000, C710S070000, C369S047360, C369S047360, C369S059160, C382S244000, C341S058000, C341S059000, C341S061000, C341S065000, 37, 37, 37

Reexamination Certificate

active

06378007

ABSTRACT:

TECHNICAL FIELD
The present invention relates to data storage, and in particular, but not exclusively, to methods and apparatus for encoding or formatting data and for storing the data to, for example, a magnetic medium such as tape.
BACKGROUND ART
Taking data storage to tape as an example, a host computer system typically writes data to a storage apparatus, such as a tape drive, on a per Record basis. Further, the host computer may separate the Records themselves using Record separators such as FILE MARKs or SET MARKs. Record length, and the order in which the Records and the Record separators are received, are determined by the host computer.
Typically, Records comprise user data, for example, the data which makes up wordprocessor documents, computer graphics pictures or data bases. In contrast, Record separators, such as FILE MARKs, are used by a host computer to indicate the end of one wordprocessor document and the beginning of the next. In other words, Record separators typically separate groups of related Records.
By way of example, the diagram in FIG.
1
(
a
) illustrates a logical sequence of user data and separators that an existing type of host computer might write to a tape storage apparatus. Specifically, the host computer supplies five fixed-length Records, R
1
to R
5
, in addition to three FILE MARKs, which occur after R
1
, R
2
and R
5
.
It is known for a storage apparatus such as a tape drive to receive host computer data, arrange the data Records into fixed-sized groups independently of the Record structure, and represent the Record structure, in terms of Record and FILE MARK position, in an index forming part of each group. Such a scheme forms the basis of the DDS (Digital Date Storage) data format standard for tape drives defined in ISO/IEC Standard 10777:1991 E. EP 0 24 542 describes one example of a DDS tape drive, which implements this scheme. Once the groups data are formed, the tape drive stores the groups to tape, typically after applying some form of error detection/correction coding.
The diagram in FIG.
1
(
b
) illustrates the organisation into DDS groups of the host computer data shown in FIG.
1
(
a
). Typically, the host computer data Records are encoded or compressed to form a continuous encoded data stream in each group. FILE MARKs are intercepted by the tape drive, and information that describes the occurrence and position of the FILE MARKs in the encoded data stream is generated by the tape drive and stored in the index of the respective group. In the present example, Records R
1
, R
2
and a part of Record R
3
are compressed into an encoded data stream and are stored in the first group, and information specifying the existence and position in the encoded data stream of the records and the first and second FILE MARKs is stored in the index of the first group. Then, the remainder of Record R
3
, and Records R
4
and R
5
, are compressed into a continuous encoded data stream and are stored in the second group, and information specifying the existence and position in the encoded data stream of the records and the third FILE MARK is stored in the index of the second group.
In such a scheme, a tape drive reading the stored data relies on information in the index to reconstruct the original host computer data for return to a host computer.
FIG. 2
illustrates very generally the form of the indexes for both groups shown in FIG.
1
(
b
). As shown, each index comprises two main data structures, namely a block access table (AT) and a group information table (GIT). The number of entries in the BAT is stored in a BAT entry field in the GIT. The GIT also contains various counts, such as a FILE MARK count (FMC) which is the number of FMs written since the beginning of Recording (BOR) mark, including any contained in the current group, and Record count (RC), which is the number of Records written since the beginning of Recording (BOR) mark, including any contained in the current group. The values for the entries in this simple example are shown in parentheses. The GIT may contain other information such as the respective numbers of FILE MARKs and Records which occur in the current group only.
The BAT describes, by way of a series of entries, the contents of a group and, in particular, the logical segmentation of the Record data held in the group (that is, it holds entries describing the length of each Record and the position of each separator mark in the group). The access entries in the BAT follow in the order of the contents of the group, and the BAT itself grows from the end of the group inwardly to meet the encoded data stream of the Record data.
The applicant's co-pending patent application “Data Encoding Method and Apparatus, filed on the same date as the present application, describes an invention wherein the requirement for a BAT is removed by embedding special, reserved codewords representing Record boundaries and Record separators, such as FILE MARKS, into the encoded data stream. Therein, Record boundaries and FILE MARKS can be located by the respective embedded codewords.
Another applicant's co-pending patent application “Data Encoding Scheme With Switchable Compression” (EP application number 97308778.6), describes an invention, which may be used in addition to the invention of the above-mentioned, co-pending application, in which both compressed data and non-compressed data can be encoded into the same continuous, encoded data stream. For the invention, preferably the non-compressed data is simply passed through the encoder and is stored in unencoded form.
In order to implement the inventions of the two co-pending applications at the same time, there is a need to encode reserved codewords into an encoded data stream even when data compression is not being applied to the input data.
DISCLOSURE OF THE INVENTION
In addressing the problem of combining the inventions of the two aforementioned co-pending patent applications, the applications have arrived at a particularly advantageous solution.
In accordance with a first aspect, the present invention provides a method of formatting host data, including the step of:
encoding members of a pre-defined group of data with m-bit codewords and encoding other data with codewords in excess of m-bits long to produce an encoded data stream, wherein all other data are encoded with codewords which have a common m-bit root sequence, and wherein the common m-bit root sequence is not itself representative of any of the members of the pre-defined group of data.
In accordance with the invention, there can be at least 2
m
codewords, 2
m−
of which are free to represent members of the group. In other words, the m-bit root sequence is not free to represent a member of the group. In a practical embodiment, the m-bit root sequence is detected during data decoding as being the start of a reserved codeword that has a length greater than m bits. Obviously, the number of bits following the root sequence for reserved codewords determines how many reserved codewords there can be in the format.
In accordance with one embodiment, there are 2
m
members in the first group of data and the reserved m-bit root sequence forms part of a longer, p-bit codeword, which is reserved to represent the remaining one of the 2
m
possible members.
The advantage here is that there are 2
m
codewords available to represent members of the group. For example, where m=8, 2
8−1
(i.e. 255) of the codewords are 8-bits long and the 28th (i.e. 256th) codeword is, for example, 9-bits long. The state of the 9th bit determines whether the 9-bit codeword is the 256th character, or whether the 9-bit codeword is a further root sequence for other reserved codewords, which can be 10 or more bits long.
In the preferred embodiment to be described, the length of reserved codewords (including the root sequence), n, is 13. Then, the state of the 9th bit after the root sequence either indicates that the 9-bits represent the 2
mth
member of the group, or that the next 4-bits (i.e. 13 bits in total) represent one of 16 possible reserv

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data encoding scheme does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data encoding scheme, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data encoding scheme will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2846084

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.