Structured document and document type definition compression

Data processing: presentation processing of document – operator i – Presentation processing of document – Layout

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06635088

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with a method, system, and computer-readable code for reducing the size of documents (such as XML and DTD documents) through novel compression techniques.
2. Description of the Related Art
Extensible Markup Language, or “XML”, is a standardized formatting notation, created for structured document interchange on the World Wide Web (hereinafter, “Web”). XML is a tag language, where specially-designated constructs referred to as “tags” are used to delimit (or “mark up”) information. In the general case, a tag is a keyword that identifies what the data is which is associated with the tag, and is typically composed of a character string enclosed in special characters. “Special characters” means characters other than letters and numbers, which are defined and reserved for use with tags. Special characters are used so that a parser processing the data stream will recognize that this a tag. A tag is normally inserted preceding its associated data: a corresponding tag may also be inserted following the data, to clearly identify where that data ends. As an example of using tags, the syntax “<email>” could be used as a tag to indicate that the character string appearing in the data stream after this tag is to treated as an e-mail address; the syntax “</email>” would then be inserted after the character string, to delimit where the e-mail character string ends.
The syntax of XML is extensible because it provides users the capability to define their own tags. XML is based on SGML (Standard Generalized Markup Language), which is an international standard for specifying document structure. SGML provides for a platform-independent specification of document content and formatting. XML is a simplified version of SGML, tailored to Web document content. (Refer to ISO 8879, “Standard Generalized Markup Language (SGML)”, (1986) for more information on SGML, and to “Extensible Markup Language (XML), W3C Recommendation Feb. 10, 1998” which is available on the World Wide Web at http://www.w3.org/TR/1998/REC-xml-19980210, for more information on XML.)
XML is widely accepted in the computer industry for defining the semantics (that is, by specifying meaningful tags) and content of the data encoded in a file. The extensible, user-defined tags enable the user to easily define a data model, which may change from one file to another. When an application generates the tags (and corresponding data) for a file according to a particular data model and transmits that file to another application that also understands this data model, the XML notation functions as a conduit, enabling a smooth transfer of information from one application to the other. By parsing the tags of the data model from the received file, the receiving application can re-create the information for display, printing, or other processing, as the generating application intended it.
A Document Type Definition, or “DTD”, may be used with an XML file. In general, a DTD is a definition of the structure of an SGML document, and is written using SGML syntax. The DTD is encoded in a file which is intended to be processed, along with the file containing a particular document, by an SGML parser. The DTD tells the parser how to interpret the document which was created according to that DTD. DTDs are not limited to use with XML, and may in fact be used to describe any document type. For example, suppose a DTD has been created for documents of type “memo”. Memos typically contain “To” and “From” information. The DTD would contain definitional elements for these items, telling the parser what to do when it encounters “To” and “From” in an actual memo (such as using bold text for printing or displaying the words “To” and “From”, left-justifying the lines on which they appear, etc). The HyperText Markup Language, or “HTML”, is a popular example of a notation defined using an SGML DTD. HTML is used for specifying the content and formatting of Web pages, where “Web browser” software processes the HTML definition along with a Web page in the same manner an SGML parser is used for other DTDs and document types. When used with XML, a DTD specifies how the tags defined for this particular document type are to be inserted into the XML data stream when the XML file is being created. When a user wishes to print or display a document encoded according to this DTD, the software (i.e. the parser, compiler or other application) uses the DTD file to determine how to process the contents of the XML document file.
Because the XML tags are defined by humans, and intended to be human-readable as well as machine-processable, they may become quite long in terms of character length. Each opening tag requires a matching closing (or “end”) tag, so that the number of characters required to express a given tag effectively doubles. As an example of tag that may be defined, suppose a user wishes to represent names and addresses in a file. The tags used to delimit the name may be simply “<name>” and “<
ame>”, where the angle brackets are the SGML (and XML) syntax designated as bracketing a tag, and the combination of the “/” symbol with an opening angle bracket further designates that this is the end tag. Alternatively, longer tags could be used such as “<customer_name>” and “</customer_name>”, or separate tags could be used to separate the first name, middle initial, and last name when the name was associated with a person. The longer the tag, the more descriptive it will tend to be. For example, if the data model includes not only one person's name, but perhaps a spouse name and children's names, or an employer's name, then more characters will need to be used in the tags (such as “<employee_name>” and “<company_name>”) to enable a human reader to understand which name is which. The value to be used for the information represented by a tag is then encoded between the opening and closing tag. For example, suppose a company name is “Acme Widget”. According to this example, the string “<company_name>Acme Widget</company_name>” would be used to encode this information in a document. The document could contain many other company names, which would be similarly encoded. Other document types which do not use company names simply define different tags, for the information that is pertinent to those document types.
There is one exception to the requirement for matching end tags for each opening tag. It may be that there is no value for the tag in a particular usage. Suppose, for example, that the person from the data model discussed above has no spouse. In that situation, no value appears between the tags where the spouse name would otherwise be located. A short-hand specification technique has been defined for this null-value case, where a “/” character is inserted into the opening tag preceding the “>” character. If “<spouse>” and </spouse>” are the tags used for bracketing the spouse string in this model, then the shorthand representation takes the form “<spouse/>”.
The longer the length of the tags in the file, the larger the file becomes. While file size may not be an issue in some computing environments, such as where a server in a network has access to banks of storage devices, there are many situations where file size can become a critical factor in operating a computer. When the file is to be received at a constrained-storage device such as a handheld computer, Personal Digital Assistant (“PDA”), or other pervasive computing device, the larger the size of the file, the more likely it is that problems will arise when trying to store it at the receiver. And, the larger the file, the longer it will take to transmit the file between computers. The popularity of using portable computers such as handheld devices for connecting to the Internet, or other networks of computers, is increasing as user interest in computing becomes pervasive and users are more often wo

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Structured document and document type definition compression does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Structured document and document type definition compression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Structured document and document type definition compression will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3154118

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.