Compression/decompression of tags in markup documents by...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06330574

ABSTRACT:

BACKGROUND OF THE INVENTION
(1) Field of the Invention
The present invention relates to a technique of compressing and decompressing data, particularly, to an apparatus, a method and a recording medium suitable for use when a document (a tag document) structured and described according to control characters (strings) called tags defining a document structure is compressed and decompressed.
(2) Description of the Related Art
A recent trend is to unify formats of documents handled by computers, an aim of which is to be able to handle formats of documents, which have been different from computer to computer, or from application to application, in different computer environments.
For example, there is an international standard (IS08879) for a document format called SGML (Standard Generalized Markup Language) established by ISO in 1986. An SGML document consists of, as schematically shown in
FIG. 31
, three portions, that is, SGML declaration
301
, document type definition (DTD: Document Type Definition)
302
and document instance
303
.
The SGML declaration
301
is a portion declaring a character set and the like necessary to process an SGML document in another system. The DTD
302
is a portion defining a structure of a document such as chapter, paragraph, title, etc., which is described in a format as shown in
FIG. 32
, for example. The DTD
302
shown in
FIG. 32
is a portion of DTD of HTML (Hyper Text Markup Language), which is a kind of SGML spread as a description format of WWW (World Wide Web) of Internet.
The document instance
303
is a body of the SGML document, which is made by a writer (user) using an editor of the computer while referring to the DTD
302
. The document instance
303
is described using controlling characters (strings) showing elements generally called tags. Each of the tags is defined in the above DTD
302
, which represents what is an element in a document instance
303
(for example, whether the element is a title, a chapter, or the like).
FIG. 33
is a diagram showing an example of description of the document instance
303
. In
FIG. 33
, a character string (<TITLE>, </TITLE>, <SECTION>, </SECTION>, etc.) sandwiched between “<” and “>”, or “</” and “>” is a tag. As shown in
FIG. 33
, a portion described as:
<TITLE>

</TITLE>represents that characters (strings) sandwiched between <TITLE> which is a start-tag and </TITLE> which is an end-tag is an element (a name of title).
There is now a strong movement to employ SGML. In particular, the National Military Establishment of U.S.A. imposes a duty on a person to describe a document in SGML to submit it. In Japan, the Patent Office has decided to employ SGML for CD-ROM publications.
Meanwhile, various types of data such as character codes, vector information, image information, etc. are handled in computers, with the quantity of data being rapidly increasing, in these years. With this, a computer generally eliminates redundant portions in data to compress a quantity of the data so as to decrease a storage capacity for the data, or enable a high-speed data transmission, when handling a large quantity of data.
There are several manners of data compressing. Herein are described an archiver and a compressing drive as examples of application of data compression used in computers.
The archiver is a manner of compressing one or a plurality of data files, and collecting them into one file. By using the archiver on a file rarely used or an old file, it is possible to decrease a capacity of the file. When a server supplies files (data, application or the like) through a personal computer communication or Internet, it is possible to save communication cost, and reduce labor required in transferring collecting all the files into one, using the archiver.
On the other hand, the compressing drive is a manner of compressing data by disk drive such as a hard disk (HD), a floppy disk (FD) or the like of a computer, as a unit. By designating an arbitrary disk drive, all files in the designated drive are compressed and held. In the compressing drive, a compressing/decompressing process is generally performed in a background of the computer, so that compression/decompression (decompression at the time of reading, and compression at the time of writing) is automatically performed in ordinary operations (read/write) by the user. Therefore, it looks to the user that a size of the designated disk system is increased since the user is not at all conscious of compression/decompression of data.
As a coding system used in these examples of application, there is often used universal coding system in which the efficiency of compression is not dependent much upon characters of data, since various data such as character, machine language, image, voice, etc. are handled in the computer.
The universal coding is classified into LZ-coding which utilizes repeatability of a character, and statistical coding which codes a probability of occurrence of a character. The LZ-coding stores a character (string) that occurred in the past in a buffer, and outputs a start position in the buffer and a coinciding length as coded data when the same character (string) occurs. The statistical coding calculates a probability (frequency) of occurrence of a character having occurred in the past, and outputs a code according to the probability of occurrence. The LZ-coding can accomplish a high-speed process, whereas the statistical coding can accomplish a high-compression rate.
The data compressing techniques are ordinarily used to decrease a data amount in the computer or a communication cost. As to a document file, it is possible to compress the whole document so as to manage a large volume of documents.
In the document instance
303
of the SGML document, a quantity of data of the document is increased since tags defining elements in the document are added to the document itself. A study on an SGML document revealed that a proportion of tags in the document exceeds forty percent. Not only documents submitted to public agencies but also manuals attached to products are more being and more changed to SGML documents, recently. Such manual are of several tens to, sometimes, several hundred pages, and are frequently revised. If a history of the revision is included, a quantity of data of the manual is enormous.
If the SGML document is compressed using the above universal coding or other coding system as well as ordinary documents or documents in another format, it is possible to decrease a quantity of the data to some extent. However, the above manners are quite inefficient since a coding system heretofore used is merely applied to the SGML document in any case, in which no consideration is made regarding tags occupying a large portion in the document in the compression.
SUMMARY OF THE INVENTION
In the light of the above problems, an object of the present invention is to improve a compression rate of a tag document and decrease a quantity of data thereof by compressing and decompressing the document in consideration of tags in the tag document.
The present invention therefore provides a tag document compressing apparatus for coding a tag document having a document type definition defining a tag showing a document structure and a document instance described using the tag defined in the document type definition to compress the tag document comprising a tag extracting unit for scanning the document definition of an inputted tag document to extract the tag, a tag code table creating unit for assigning a predetermined code to the tag in the document definition on the basis of the tag extracted by the tag extracting unit to create a tag code table, and a tag coding unit for coding the tag in the document instance on the basis of the tag code table created by the tag code table creating unit.
The present invention also provide a tag document compressing method for coding a tag document having a document type definition defining a tag showing a document structure and a document instance described using

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Compression/decompression of tags in markup documents by... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Compression/decompression of tags in markup documents by..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Compression/decompression of tags in markup documents by... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2595971

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.