Application-independent generator to generate a database...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06321374

ABSTRACT:

BACKGROUND OF INVENTION
1. Field of the Invention
Providing a method to facilitate system integration and application/solution development for heterogeneous information systems is valuable. It is also valuable to have a re-usable tool to generate application-specific programming interfaces (APIs) and utilities for loading and accessing heterogeneous information.
This invention relates to an improved method of handling heterogeneous information.
Except for limited cases, it is almost impossible to design a generic database that is suitable for all digital library applications. Thus, a replicable digital library solution would not be able to offer a generic “library”, and specific data loading and access software has to be developed for/by each customer.
This invention is directed to a re-usable tool which generates application-specific software for each digital library application. This should significantly reduce costs.
2. Description of Related Art
System integration and application development are major undertakings for building heterogeneous information systems such as digital libraries. A digital library application typically handles a large amount of both structured information (e.g., bibliographic data, catalog data, structured documents, business data) and unstructured information (e.g., image, text, audio, video). To leverage off-the-shelf technologies, each form of data is usually managed by a separate, specialized resource manager. For example, a database management system (DBMS), such as DB
2
(™), may be used to manage structured data; an object repository system, such as ADSM™, may be used to manage image and text; a stream-data server, such as TigerShark (™), may be used to manage audio and video.
To manage these data properly for a digital library application, a customized data model is frequently required, involving application-specific tables, attributes, structures, relationships, constraints, semantics, and optimization. In many cases, a digital library application is an extension of a customer's existing database and production application. In other cases, it is a component of the customer's overall information technology vision. Thus the data management requirements can be much broader than those of the digital library application alone. For these reasons, the data model requirements are often different even between two similar digital library applications within the same industry.
In the publishing industry, for example, a publisher typically designs its own proprietary database to maintain its bibliography and content data for producing new, electronic products. There are also reported cases that different organizations within a large enterprise require different metadata on the same data. Therefore, it is not possible to pre-design a fixed data database that can support all digital library applications, except for the case where a relatively simple and generic model is sufficient, for instance, VisualInfo (™).
Without a common data model, software vendors/developers are not able to produce re-usable software, namely applications, middleware, tools, or utilities, that access a large amount of information efficiently. Although it is sometimes possible for an application to dynamically “discover” the data model from a “bootstrap” model, the performance of such an approach would not be acceptable and the restrictions would be severe. Furthermore, for a DBMS that supports query compilation, e.g., DB
2
(™), a target database is needed for software compilation and it must be distributed together with compiled software.
Even if a common data model is possible, the model would mask the underlying resource managers thereby preventing a full utilization of the resource manager capabilities. For instance, version support in ADSM (™) for objects and retention management capability. In fact, the common data model would “freeze” the data management technologies, preventing further exploitation of new capabilities in the future. In theory the higher-level data model can be extended when an underlying resource manager is enhanced. This is not practical because of the multitude of many resource managers, and in fact it is not always possible because the higher-level model would not be able to reflect all lower-level capabilities. For this reason, many application developers and system integrators prefer using the application programming interfaces (APIs) of the resource managers directly, especially standardized API such as SQL.
Moreover, an essential operation for a digital library, (and for many other heterogeneous information systems) is to load information into the library. Typically performed by authorized workers, this operation is frequently high-volume, batch-oriented and performance-sensitive. It usually requires a proper coordination among the separate operations against the underlying resource managers in order to avoid inconsistencies. Such coordination is similar to the data synchronization required for distributed data processing, in which case techniques such as “two-phase commit” are well-known. However, most resource managers used by a digital library do not have a two-phase-commit capability.
On the other hand, a rigorously synchronized operation that is required for on-line transaction processing (OLTP) is not necessarily appropriate for digital libraries. For example, to protect against failure during batch updates (e.g., loading data), a restart capability relying on redundancy available outside the digital library system (e.g., content source files) can be equally effective but much more efficient than a conventional transaction-rollback followed by a rollforward using a complete transaction log.
Asynchronous operations are not only acceptable but also frequently preferred. The following are a few motivations:
1. The DB
2
(Version 2) Load Utility, which does not allow record-level synchronization, is much more efficient than individual insertion of records.
2. Full-text indexing of text objects is usually much more efficient if performed in batch (asynchronous with object insertion) than performed individually (synchronized with insertion).
3. Synchronous indexing of text objects also leads to long DBMS transactions which degrade DBMS performance due to locking.
4. Recoverable deletion (required to support transaction rollback) of a large object can be very expensive unless the resource manager provides an efficient support. Most object repositories, such as ADSM (™), do not. On the other hand, non-recoverable deletion is acceptable for many digital library applications.
5. For ADSM (™), retention management can be used more efficiently and effectively to delete old “versions” of objects than to delete them individually and explicitly.
To support asynchronous, but coordinated, operations, a multi-state consistency model is usually a better transaction model for a unit of work than the binary model (“all done” or “all not done”), which is appropriate for OLTP. On the other hand, the “nested transaction” model that is suitable for engineering design and other long-duration applications is not sufficient for digital libraries, since there is often no pre-determined ordering of the coordinated operations, and furthermore, parallelism is preferred when possible.
Besides asynchronous operations, many digital library applications actually have special consistency requirements (e.g., whether “orphan” objects are allowed) and operational requirements (e.g., whether inserting an already existed object constitutes an error, and how to handle such a condition). To fit all these requirements into a fixed paradigm of transaction and constraint, if this is possible, many artificial work-arounds for resource managers would be needed. Furthermore, data loading is an integral part of the content creation/capture/import process, which undoubtedly varies with each application because of the diverse content sources and creation/capture tools. While some applications load data from files, others prefer data loading from buffer (e.g., after performing image e

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Application-independent generator to generate a database... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Application-independent generator to generate a database..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Application-independent generator to generate a database... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2609023

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.