Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-06-14
2004-02-24
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06697818
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to database management systems and, more particularly, to methods and apparatus for extending relational and object-relational database management systems.
BACKGROUND OF THE INVENTION
A database system usually comprises database clients connecting through a network (e.g., Internet, Intranet, etc.) to one or more database servers managed by database management systems (DBMSs). Such an arrangement is shown in FIG.
1
. As shown, a database client (computer system)
102
is connected to one or more DBMSs (computer systems)
101
-
1
through
101
-N (each DBMS including one or more database servers) via a communications network
103
. It is known that virtually all of the business data handled by database systems, such as data generated by retail sales, falls into the category of structured data. Structured data is data that is present in a structured format, such as data tables in a relational database or spreadsheet. As non-structured data (such as text, time series, images, audio, and video) and semi-structured data (such as HyperText Markup Language (HTML), Extensible Markup Language (XML), and other tagged documents) begin to become prevalent, a database management system has to substantially change its access and search capabilities in order to efficiently manage these types of data.
A taxonomy of different data types and examples is illustrated in FIG.
2
. As shown in
FIG. 2
, data
200
can be categorized as structured
202
, non-structured
204
and semi-structured
206
. As mentioned, an example of structured data is data in a relational table spreadsheet, while examples of semi-structured data include XML data and HTML data. Non-structured data can be further categorized as vectors
208
, lattice
210
and text
212
. Vectors may include lines
214
and polygons
216
used in various Geographic Information Systems (GIS). Lattice data may include 1-dimensional data
218
(such as audio and time-series data), 2-dimensional data
220
(such as images, photos), 2-dimensional plus time data
222
(such as video), 3-dimensional data
224
(such as magnetic resonance image (MRI) data, CT (computer tomography) data, seismic data) and 3-dimensional plus time data
226
(such as climate model simulation output data).
To facilitate accessing and managing both non-structured and semi-structured data, database vendors have begun to use an object-relational approach to enhance and enrich the data types that can be handled and managed. An object-relational data model allows the attributes of a relational table to be an abstract data type, which can include both complex data structures and access methods to these structures. This methodology allows the application builder to store data based on new data types and access methods into a relational table. Modules comprising predefined data types and methods have also been developed to facilitate the access of those data types taxonomized in FIG.
2
. Currently, all of the commercially-available major object-relational databases have provided a two-tier architecture: (1) a relational database engine; and (2) extension modules. Examples of extension modules include extenders used in IBM Corporation's DB2 database, data cartridges used by the Oracle database, and datablades used by the Informix database. These extension modules take advantage of both user-defined data types (and abstract data types in some of the newer versions of the databases) and user-defined functions (UDFs) enabled by the database engine to extend the capability of the relational database engine. Most of the extension modules (extenders, datablades, and data cartridges) are in the area of non-structured and semi-structured data management such as images, video, spatial data, text, and the recently emerging XML data.
This approach, however, has the following drawbacks:
(1) Query optimization involving UDFs: Query optimization involving UDFs from extenders/datablades/data cartridges is difficult and sometimes impossible due to the wide varying possibilities for estimating the cost function. In general, sub-queries with high selectivity and low computation cost are prioritized over sub-queries with low selectivity and high computation cost during query optimization. This methodology is applicable to precise constraints (including precise range queries). However, this optimization methodology has difficulties in dealing with fuzzy constraints and, in general, can not deal with similarity queries where all the objects in the database are candidates. In particular, query optimizers within any of the existing object-relational databases can not handle queries involving fuzzy Cartesian operators.
(2) Developing new extenders from existing search engines: Currently, each of the object-relational databases has relatively rigid APIs (application programming interfaces), and it is a tremendous effort to develop necessary “glues” for transforming an existing search engine into an extender/datablade/data cartridge. It is to be understood that the term glue, as well as the term “wrappers,” refer to the software code necessary to transform a set of APIs to another set of APIs. A standard search engine has its own APIs which might not observe the programming models used in a database. As a result, software wrappers or glue are needed to transform the API from the search engine to the software environment needed by a database.
In the following discussion, we further elaborate on the first problem (query optimization).
FIG. 3
illustrates an example of querying non-structured data such as images. The coarsest level of retrieval (coarse grain access) is the entire non-structured document, such as the whole image as shown in block
301
. It is also possible to retrieve a sub-region of an image (e.g., facial region) as shown in block
302
. Many emerging applications require retrieving at the object (e.g., tree, car, person) level (fine grain access) as shown in block
303
. As the size of the document becomes increasingly large, object-level retrieval will also become increasingly important. Retrieval of the document, sub-document, or object based on meta-data other than a conventional data type requires the use of user-defined data types and user-defined functions. Both IBM and Informix have extenders and datablades, respectively, for performing this kind of access. However, the access of non-structured data is usually based on similarity measures such as Euclidean distance. This implies that all the entries within a database can be considered as candidates, and a very different set of criteria (as opposed to those used in a relational database) need to be adopted to prune search results. Currently, all existing object-relational databases have to go through the following process to combine query results from SQL (Standard Query Language) and from extension modules:
(1) request a pre-determined number of results (say the top 1000) from those extension modules (extenders, data cartridges, or datablades) which access non-structured data;
(2) rank the returned results based on a similarity measure (such as the Euclidean distance between the query and the retrieved result); and
(3) combine the returned results with other sub-queries that are processed through SQL.
However, this strategy may not yield the correct results when results from multiple extenders need to be combined (because of premature pruning by each extension module). Furthermore, the process of joining results from relational operations with those from extension modules encounters similar difficulties in producing the correct results.
FIG. 4
provides a taxonomy of different queries which challenge existing relational query paradigms. Four types of queries are listed here: “Join (denoted as
402
),” “Logical Composition (denoted as
404
),” “Spatial Composition (denoted as
406
)” and “Temporal Composition (denoted as
408
).” Existing relational mechanisms based on standard SQL queries can already handle the precise queries
410
in the tables. The fuzzy queries
412
Bergman Lawrence D.
Chang Yuan-Chi
Choy David Mun-Hien
Fuh Gene Y. C.
Hsiao Hui-I
Dang Thu Ann
International Business Machines - Corporation
Mizrahi Diane D.
Ryan & Mason & Lewis, LLP
Wu Yicun
LandOfFree
Methods and apparatus for constructing and implementing a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for constructing and implementing a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for constructing and implementing a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3325672