Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-07-20
2004-10-12
Kindred, Alford (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C709S223000, C715S252000
Reexamination Certificate
active
06804674
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to the Internet, database, and content management systems for the storage of files and data objects. Particularly, this invention is directed to a system and method for the efficient management of access and control over files and data linked to a database system and stored externally in a file system or another object repository. More specifically, the present invention relates to a scalable eContent system and associated method for managing the same.
BACKGROUND OF THE INVENTION
In the past few years, the World Wide Web (“the web”) has become a very important medium for information sharing and distribution. The number of business transactions taking place on the web has also been constantly increasing. This has resulted in significant changes in the way organizations communicate with their customers, employees and business partners.
Although companies still communicate information with sound, pictures, video, and the written word, the size and content of the information and the access frequency to the information are increasing faster than ever before. In addition, companies are making content available and accessible to employees, customers, and even general public from anywhere on the web, for competitive, contractual, financial, and legal reasons.
Corporate communications span multiple environments, employ diverse platforms and protocols, innumerable applications, WANs, LANs, extranets, intranets, and Virtual Private Networks (VPN). How to efficiently manage the fast growing information content and consistently keep up with the constant modifications to the information content have become major challenges to the information technology (IT) organizations in both large and small companies. Web applications providing gateways to corporate resources access and revise information from multiple legacy systems and repositories.
Traditionally, content management systems are accessed by a small number of clients (or employees) within an enterprise and only a small subset of the business information is stored on-line in these systems. Supporting large numbers of client connections is not a major concern in such an environment. However, when information is accessible to all employees, or even to customers and partners, through the web, client connection scalability becomes yet another major challenge to the IT professional.
In a content management system, three types of information are stored: primary content (also referred to as Data or Object), User Metadata, and System Metadata. Semi-structured and unstructured data, such as text file, image, web page, video clip, etc., constitute the primary content in a content management system. Description of, and information about the stored primary content, which are normally provided by the users, are referred to as user metadata.
The information created by the content management systems for access control, storage management, and content tracking/reference is referred to as system metadata. In contrast to the primary content, both user metadata and system metadata are well structured. Content management systems, in general, use a database system as a persistent repository for storing both user metadata and system metadata.
While semi-structured and unstructured data can also be stored in database tables, they are normally stored outside of databases in practice for performance and accessibility reasons. This is because of several reasons, among which are the following:
(a) the size of these types of data tends to be very large and the conventional database systems are not designed to handle them efficiently; and
(b) storing primary content (e.g. files) in a database makes it very difficult for applications to access the content through a native application program interface (API). Consequently, content management systems normally store and manage primary content and metadata separately. This leads to a distributed system architecture where the system storing metadata for search and access control becomes the master, commonly referred to as the library server (LS), while one or more systems storing the primary content become the slaves or the object servers (OS). OS is also known as resource manager (RM).
The current generation of conventional content management products is based on a self-contained, closed system architecture, and as such the content management have poor extensibility. The new generation of content management systems adopt an open system architecture that promises to fix the extensibility problem and to improve the system performance. However, such content management systems continue to be based on a distributed system architecture that will face many of the same issues encountered in the traditional distributed (database) systems.
Conventional content management systems provide a set of functions for content (data and metadata) creation, content management, and content distribution that enable users to manage data, system metadata, and user metadata. These conventional content management systems suffer from numerous shortcomings, among which are the following:
(a) Lack of Scalability and Extensibility:
Since a computer system has limited processing power and storage capacity, a content management system needs to have an architecture that is scalable so as to support future business/content growth. Three areas of scalability of particular interest to content management users are: primary content, metadata, and client/user connections.
For scalability in total content size and in the number of objects, conventional content management systems allow a library server to manage objects stored in multiple distributed object servers. When the total size of the primary content saturates or exceeds the capacity of an object server, an additional object server is installed. On the other hand, current content management systems cannot gracefully handle a significant increase in either metadata size or the number of client connections without a major architecture overhaul.
When the size of the metadata out-grows the capacity of a library server, multiple library servers must be installed, each managing a subset of the objects. Like all distributed systems, this type of space partition has a major problem, namely location transparency to clients. Since each library server is an autonomous server and it has no knowledge of remote objects stored in other library servers, clients are forced to keep track of each library server's storage content. In addition, if a client needs to search information in all the library servers, the client is forced to establish individual connections to the library servers and to manage the merging of the results from multiple library servers.
Client connection scalability is also a significant limitation of the conventional content management architecture. When the number of users exceeds the capacity and/or processing power of a content management system, one can either limit the number of concurrent client connections or employ a more powerful computer system. Limiting client connections reduces productivity and limits company growth potential making it very undesirable. Replacing an existing computer system with a new one may require unloading data from old machine and reloading the data into the new machine, which is a cumbersome task as the size of data is normally huge. In addition, powerful machines are much more expensive and not always available.
The other alternative for providing client connection scalability is by installing multiple servers with replicated content, for example Yahoo web server, and a middle-tier server that is responsible for routing requests to one of the replicated servers. Replicating content incurs additional problem and complexity of synchronizing replicas.
(b) Poor Atomicity and Referential Integrity:
When an object is inserted or updated in a content management system, reference to and description of the object will also need to be created or updated in order to provide data consistency and avoid a refe
Hsiao Hui-I
Williams Robin
Kassatly Samuel A.
Kindred Alford
LandOfFree
Scalable Content management system and method of using the same does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Scalable Content management system and method of using the same, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Scalable Content management system and method of using the same will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3267996