Parallel loading of markup language data files and documents...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06631379

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to the arts of data conversion and processing for loading database, and more specifically to loading text contained in document files which are in a markup language such as hyper text markup language (“HTML”) and extensible markup language (“XML”).
2. Description of the Related Art
Markup languages for describing data and documents are well-known within the art, especially Hyper Text Markup Language (“HTML”). Another well-known markup language is Extensible Markup Language (“XML”). Both of these languages have many characteristics in common. Markup language documents tend to use tags which bracket information within the document. For example, the title of the document may be bracketed by a tag <TITLE> followed by the actual text of the title for the document, closed by a closing tag for the title such as </TITLE>.
Hypertext documents, such as HTML, are primarily used to control the presentation of a document, or the visual rendering of that document, such as in a web browser. As such, many of the tags which are defined in the HTML standards control the visual appearance of the presentation of the data or information within the document, such as text, tables, buttons and graphics.
XML is also a markup language, but it is intended for primarily not for visual presentation of documents but for data communications between peer computers. For example, an XML document may be used to transmit catalog information from one server computer to another server computer so that the receiving server computer can load that data into a database. While XML documents maybe viewed or presented, the primary characteristics of the XML language provide for standardized interpretation of the data which is included, rather than standardized presentation of the data which is included in the document.
As such, XML is a highly flexible method or definition which allows common information formats to be shared both across computer networks such as the World Wide Web, and across intranets. This standard method of describing data allows users and computers to send intelligent “agents” or programs to other computers to retrieve data from those other computers. For example, an intelligent agent could be transmitted from a user's web browser or application server system to a plurality of database servers to gather certain information from those servers and return it. Because XML provides a method for the intelligent agent to interpret the data within the XML document, the agent can then execute its function according to the parameters specified by the user of the intelligent agent.
XML is “extensible” because the markup symbols, or “tags”, are not limited to a predefined set, but rather are self-defining through a companion file or document called a Document Type Definition (“DTD”). As such, additional document data items may be defined by adding them to the appropriate DTD for a class of XML files, thereby “extending” the definition of the class of XML files.
XML is actually a reduced set of the Standard Generalized Markup Language (“SGML”) standard. The DTD file associated with a particular class of XML documents describes to an XML reader or XML compiler how to interpret the data which is contained within the XML document.
For example, a DTD file may define the contents of an XML document (or class of documents) which are catalog page listings for computer products. In this example, the DTD document may describe an element “computer specifications.” Within that element may be several data items which are bracketed by tags, such as <MODEL> and </MODEL>, <PART_NUMBER> and </PART_NUMBER>, <DESCRIPTION> and </DESCRIPTION>, <PROCESSOR> and </PROCESSOR>, <MEMORY> and </MEMORY>, <OPERATING_SYSTEM> and <OPERATING_SYSTEM>, etc. Thus, the DTD document defines a set or group of data items which are surrounded by markup tags or symbols for that particular class of XML documents, and it serves as a “key” for other programs to interpret and extract the data from XML documents in that class.
As in this example, an XML reader could be used to view the XML files, interpreting and presenting visually the contents of the XML files somewhat like a catalog page, and according to the DTD definitions. Unlike an HTML document, however, the XML document may be used for more data intensive or data communications related purposes. For example, an XML compiler can be used to parse and interpret the data within the document, and to load the data into yet another document or into a database. And, as described earlier, an intelligent agent program may be dispatched to multiple server computers on a computer network looking for XML documents containing certain data, such as computers with a certain processor and memory configuration. That intelligent agent then can report back to its origin the XML documents that it has found. This would enable a user to dispatch the intelligent agent to gather and compile XML documents which describe a computer the user may be looking to buy.
One common business application of XML is to use it as a common data format for transfer of data from one computer to another, or from one database to another database.
There are several tradeoffs with current XML implementations: performance, ease of use, and extendibility. Typically, performance is inversely related to ease of use, and often, extendibility is not an option. When loading data from an XML document into a database, the following steps typically occur by systems available currently:
(a) parsing of the XML file, which loads all the data contained in the XML file into system memory for use by the program;
(b) generating of database commands, such as SQL statements, to execute against the database to load the data from the XML file into the database; and
(c) establishing communications to or a session with a database or database server, and
(d) issuing the appropriate database commands to accomplish the data loading.
Turning to
FIG. 1
, the well-known process of loading an XML document into a database is shown. First, the entire XML document is loaded (
1
) into system memory (
2
). As some XML documents are quite large, and several documents may be being loaded simultaneously by one computer, this can present a considerable demand on system memory resources. Then, the entire XML file is parsed (
3
) for specific elements and data items according to the DTD file. This, too, tends to consume considerable system memory resources because XML files can be very large files. The most common parsing technology used in this step is referred to as “DOM.” DOM is a process which loads an entire XML file into memory and then processes it until complete.
Next, after the data items and elements have been parsed from the XML file, SQL commands (or other database API commands) are generated (
4
) in order to accomplish the data loading into a database.
Last, the SQL commands are executed (
5
) in order to affect the loading of the data from the XML document into the database. Subsequently, any further XML documents to be parsed and loaded into the database are retrieved and processed one document at a time (
6
).
Thus, the commonly used process both consumes considerable system memory resources, and, because the process is executed in a linear stepwise fashion, it is inherently slow because it is only executing one task at a time, such as loading the XML document, parsing the XML document, or generating SQL commands. Further, because many database servers are remote to the actual XML loading server, the SQL commands may take considerable time to execute. Thus, the XML document content tends to stay resident in system memory for an unacceptably long period of time, and the system remains unavailable to start additional XML data file loading until the previous load is completely done.
Turning to
FIG. 2
, the linear processing nature of the commonly used process is shown. First, the XML data is loaded in the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Parallel loading of markup language data files and documents... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Parallel loading of markup language data files and documents..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parallel loading of markup language data files and documents... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3162526

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.