Quick difference and update for tree structure data

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C706S013000, C706S046000, C706S047000, C706S059000, C709S221000

Reexamination Certificate

active

06260042

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an efficient mechanism for the differentiation and update of data structured in tree format. The invention has particular application to version management of tree structured data but the tree differentiation process according to the invention can differentiate any two trees, regardless of whether they are successive versions or not.
2. Background Description
The information used by computer programs can be represented in multiple formats. There is, however, a growing trend towards moving much of these data into a standardized tree structured for-mat. The XML (eXtensible Markup Language) and DOM (Document Object Model) standard proposals are two leading efforts in this trend. The prospect of having a preferred data representation format raises the question of the adequacy of the existing mechanisms of data management.
Consider the case of information (source code, textual data, etc.) that varies over time. Version management is then a critical issue. At the core of the version management issue lies the following problem: given two successive states of the information, it is necessary to be able to describe the informational difference between these two versions, and to bring older versions up to date with newer ones.
In the current approach, version management is achieved by managing external representations of the data, rather than the data itself. The information is first converted into some conventional representation, typically a sequence of text lines or bytes, which is then processed. The internal structure of the data is irrelevant in this model; rather, all information is treated as a more or less unstructured sequence of tokens.
While sufficient for many computational purposes, this model has two major drawbacks. The first one arises from the intrinsic ambiguity of the conversion to an external format: the same data can have multiple external representations in a given format. As a result irrelevant difference reports (“false negatives”) are often generated.
Second, and most important, the way differences are reported (e.g., in terms of byte or line mismatches) bears no relation with the intrinsic structure of the data, and requires an additional “interpretative” step to infer the actual informational difference.
It would seem then that there are big advantages in doing version management directly on the internal representation of the data. When tree structured data is considered, however, there is one major obstacle: the high computational cost of the tree differentiation algorithms. As an indication of this, consider the cost of the optimal tree differencing algorithm: for labeled ordered trees, the cost is no less than
Nodes(tree 1)×leaves(tree 1)×Nodes(tree 2)×leaves(tree 2), that is, the cost increases at least quadratically when the size of the tree increases. For unordered labeled trees the situation is still worse. Optimal differentiation has exponential cost in the worst case,
Nodes(tree 1)×Nodes(tree 2)+[leaves(tree 1)]!×3**[leaves(tree 1)1×Nodes(tree 2)×. . .
The consequence is that tree differencing algorithms become impractical for most realistic problems as soon as the size of the trees involved starts to grow, unless large amounts of time and computing power are available.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide an efficient mechanism for the differentiation and update of data structured in tree format.
According to the invention, there is provided an efficient differencing and update mechanisms that operates directly on the native representation of the data. No irrelevant differences are introduced by the process. The difference reports generated are stated in terms of the basic operations that can be performed on a data tree, thus requiring no additional interpretation, and making it amenable to direct human inspection and understanding. This mechanism can become a central part of any version management system.


REFERENCES:
patent: 5911072 (1999-06-01), Simonyi
Zhang et al. “Simple Fast Algorithms For the Editint Distance Between Trees and Related Problems”, SIAM J Comp., vol. 18, No.6 pp 1245-1562, Dec. 1989.*
A.Jadhav et al. “Reduced-tree-based soft decoding for block-coded modulation”, IEE Proc., Comm., vol. 144, No.2, Apr. 1997.*
R. Rivest “The MD5 Message-Digest Algorithm”, MIT Laboratory for Computer Science and RSA data Security, Inc., Apr. 1992.*
Uramoto et al. “Digest Values for DOM (DOMHASH) Proposal”, IBM Research, Tokyo Research Laboratory, Jun. 1998.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Quick difference and update for tree structure data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Quick difference and update for tree structure data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Quick difference and update for tree structure data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2540600

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.