Methods and system for model matching

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06826568

ABSTRACT:

COPYRIGHT NOTICE AND PERMISSION
A portion of the disclosure of this patent document may contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document Copyright© 2001, Microsoft Corp.
FIELD OF THE INVENTION
The present invention relates to model or schema matching, or more generally to the matching of separate hierarchical data sets. More particularly, the present invention relates to methods and systems for matching models, or schemas, that discover similarity coefficients between schema elements, including analyses based on one or more of schema names, schema data types, schema constraints and schema structure.
BACKGROUND OF THE INVENTION
Match is a schema manipulation operation that takes two schemas, models or otherwise hierarchically represented data as input and returns a mapping that identifies corresponding elements in the two schemas. Schema matching is a critical step in many applications. For example, in Ebusiness, match helps to map messages between different extensible markup language (XML) formats. In data warehousing, match helps to map data sources into warehouse schemas. In mediators, match helps to identify points of integration between heterogeneous databases. Schema matching thus far has primarily been studied as a piece of other applications. For example, schema integration uses matching to find similar structures in heterogeneous schemas, which are then used as integration points. Data translation uses matching to find simple data transformations. Given the continued evolution and importance of XML and other message mapping, match solutions are similarly likely to become increasingly important in the future.
Schema matching is challenging for many reasons. First and foremost, schemas for identical concepts may have structural and naming differences. In addition, schemas may model similar, but yet slightly different, content. Schemas may be expressed in different data models. Schemas may use similar words that may nonetheless have different meanings, etc.
Given these problems, today, schema matching is done manually by domain experts, sometimes using a graphical tool that can graphically depict a first schema according to its hierarchical structure on one side, and a second schema according to its hierarchical structure on another side. The graphical tool enables a user to select and visually represent a chosen mapping to see how it plays out vis-à-vis the other remaining schema elements. At best, some tools can detect exact matches automatically, although even minor name and structure variations may lead them astray. Despite match being such a pervasive, important and difficult problem, model matching has not yet been studied independently except as it may apply to other more narrow problems, such as those named above, and thus a generic solution for schema matching that can apply to many different data models and application domains remains to be provided. Moreover, such a wide variety of tools would benefit from a matching solution that an independent match component or module that can be incorporated into or downloaded for such tools would be of great utility.
For a more detailed definition, a schema consists of a set of related elements, such as tables, columns, classes, XML elements or attributes, etc. The result of the match operation is a mapping between elements of two schemas. Thus, a mapping consists of a set of mapping elements, each of which indicates that certain elements of schema S
1
are related to certain elements of schema S
2
. For example, as illustrated in
FIG. 1
, a mapping between purchase order schemas PO and POrder may include a mapping element that relates element Lines.Item.Line of S
1
to element Items.Item.ItemNumber of S
2
, as shown by the dotted line. While a mapping element may have an associated expression that specifies its semantics, mappings are treated herein as nondirectional.
A model or schema is thus a complex structure that describes a design artifact. Examples of models are Structured Query Language (SQL) schemas, XML schemas, Unified Modeling Language (UML) models, interface definitions in a programming language, Web site maps, make scripts, object models, project models or any hierarchically organized data sets. Many uses of models require building mappings between models. For example, a common application is mapping one XML schema to another, to drive the translation of XML messages. Another common application is mapping a SQL schema into an XML schema to facilitate the export of SQL query results in an XML format, or to populate a SQL database with XML data based upon an XML schema. Today, a mapping is usually produced by a human designer, often using a visual modeling tool that can graphically represent the models and mappings. To reduce the effort of the human designer, it would be desirable to provide a tool that at a minimum provides an intelligent initial mapping as a starting point for the designer. Thus, it would be desirable to provide a robust algorithm that automatically creates a mapping between two given models.
Also, there is a related problem of query discovery, which operates on mapping expressions to obtain queries for actual data translation. Both types of discovery are needed. Each is a rich and complex problem that deserves independent study. Query discovery is already recognized as an independent problem, where it is usually assumed that a mapping either is given or is trivial. Herein, the problem of schema matching is analyzed.
It is recognized that the problem of schema matching is inherently subjective. Schemas may not completely capture the semantics of the data they describe, and there may be several plausible mappings between two schemas, making the concept of a single best mapping ill defined. This subjectivity makes it valuable to have user input to guide the match for user validation of the result. This guidance may come via an initial mapping, a dictionary or thesaurus, a library of known mappings, etc. Thus, the goal of schema matching and one not yet adequately achieved by today's algorithms is: Given two input schemas in any data model, optional auxiliary information and an input mapping, compute a mapping between schema elements of the two input schemas that passes user validation.
The following is a taxonomy of currently known matching techniques. Schema matchers can be characterized by the following orthogonal criteria. With respect to schema-based vs. instance-based criteria, schema-based matchers consider only schema information, not instance data. Schema information includes names, descriptions, relationships, constraints, etc. Instance-based matchers either use metadata and statistics collected from data instances to annotate the schema, or directly find correlated schema elements, e.g., using machine learning.
With respect to element vs. structure granularity, an element-level matcher computes a mapping between individual schema elements, e.g., an attribute matcher. A structure-level matcher compares combinations of elements that appear together in a schema, e.g., classes or tables whose attribute sets only match approximately.
With respect to linguistic-based matching, a linguistic matcher uses names of schema elements and other textual descriptions. Name matching involves: putting the name into a canonical form by stemming and tokenization, comparing equality of names, comparing synonyms and hypernyms using generic and domain specific thesauri and matching substrings. Information retrieval (IR) techniques can be used to compare descriptions that annotate some schema elements.
With respect to constraint-based matching, a constraint-based matcher uses schema constraints, such as data types and value ranges, uniqueness, requiredness, cardinalities, etc. A constraint-b

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and system for model matching does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods and system for model matching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and system for model matching will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3296795

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.