System and method for rewriting queries

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S200000, C707S793000

Reexamination Certificate

active

06490589

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to methods and systems for use in providing services in a distributed information system, and more particularly to methods for formulating queries in a distributed directory service.
BACKGROUND OF THE INVENTION
A directory service component of a distributed computing environment is intended to make it easier to find information. It provides the information necessary to access a person or resource. For example, a directory service permits administrators, users and applications to find information about people, printers, files and other shared resources in a single place even if the resources reside on many different physical servers. The directory may be viewed as a database of information about objects, including people, organizations and processes.
In massively distributed information systems, it can be extremely expensive and in some cases impossible to centrally index all the information from the large number of information sources available. One of two strategies is typically employed to conduct a search that covers the contents of large numbers of information sources. These strategies are ‘query flood’ and ‘query routing.’
Query flood requires the client to talk to each source in some subset of the system, whether or not that source has any relevant data. The subset of the system searched is selected by picking a location in the global hierarchical namespace and searching the sources whose names are ‘children’ of the location picked. The query flood strategy can be prohibitively expensive.
Query routing, on the other hand, uses hierarchies of indexes to allow queries to be directed to only those sources having information that may fulfill the query. For most types of information, this means that the client has to contact far fewer sources than before.
The present invention provides improvements to query routing methods and systems, such as the Common Indexing Protocol (CIP) used in the WHOIS++ directory service. For background on this work, see Deutsch, et al., “Architecture of the WHOIS++ Service,” August 1995; Faltstrom, et al., “How to Interact With a WHOIS++ Mesh”, February 1996; Weider, et al., “Architecture of the WHOIS++ Service,” February 1996; and Weider, et al., “Hierarchical Extensions to the Common Indexing Protocol,” November 1996. Although the background of the present invention is explained below in connection with the WHOIS++ directory service, it should be noted that the invention is by no means limited thereto.
The WHOIS++ Directory Service
With the vast amount of directory information potentially available, e.g., on the Internet, it is not feasible to provide a centralized directory to serve all this information. Instead of being centralized, the WHOIS++ directory service is ‘distributed’ in that it is based on a hierarchy of directory information collection agents. In this architecture, a directory query is delivered to an agent in the tree, and then handed up or down, as appropriate, so that the query is delivered to the agent that holds the information that fills the query.
One of the primary assumptions made by recent implementations of distributed directory services is that every entry in the directory resides in some location in a hierarchical name space. While this arrangement is ideal for reading the entry once its location is known, it is not as good when searching for the location of those entries that meet some set of criteria. If the only criteria known about a desired entry are items that do not appear in the namespace, a global query is required. Whenever a global query (at the root of the namespace) or a query at the top of a subtree in the namespace is issued, that query is replicated to all subtrees of the starting point. The replication of the query to all subtrees is not necessarily a problem, since the queries may be inexpensive. However, every server to whom the query has been replicated must process that query, even if it has no entries matching the specified criteria. This part of global query processing is quite expensive.
The WHOIS++ directory service addresses these problems by a combination of two techniques: directory meshes and forward knowledge. Although every entry in WHOIS++ does have a unique identifier (i.e., it resides in a specific location in the namespace), the navigational algorithms to reach a specific entry do not necessarily depend on the identifier assigned to the entry. The WHOIS++ service gets around the namespace and hierarchy problems by creating a directory mesh on top of the entries. Each layer of the mesh has a set of ‘forward knowledge’ that indicates the contents of the various servers at the next lower layer of the mesh. Thus, when a server in a particular layer of the mesh receives a query, it can prune the search tree and hand the query off to only those lower level servers that have indicated they might be able to answer it. Searching thus becomes feasible at all levels of the mesh.
Most directory services use a basic ‘template-based’ information model, in which each entry consists of a set of attribute-value pairs. To participate in the index service, the underlying database should be able to generate a ‘centroid’ or some other type of forward knowledge for the data it serves. The so-called centroid of a server may include a list of the templates and attributes used by that server, and a word list for each attribute. The word list for a given attribute could contain, e.g., one occurrence of every word appearing at least once in that attribute in some record in that server's data.
An index server collects and collates the centroids of either a number of base data servers or a number of other index servers. It is able to generate a centroid for the information it contains. In addition, when it receives a query, the index server searches its collections of centroids, determines which servers hold records that may fill that query, and then notifies the user's client of the identities of the servers to which the query should be submitted next. An index server can also contain primary data and thus act as both an index server and a base level server. In this case, the index server's response to a query may be a mix of records and referral pointers.
Shortcomings of the Prior Art
Both the query flood and query routing search strategies, in their present form, have shortcomings in both time and resource efficiencies. For example, the protocols used to create and query the hierarchies of indexes implicitly assume that the query issued to each index (and to the data sources) is the same in each case. The present disclosure addresses this problem by disclosing ways to permit index servers to ‘rewrite’ the query to be sent to the next source down in the hierarchy. In contrast to the present invention, previous referral-based systems simply told the client which server to contact next when resolving a query, but not how to rewrite the query.
Moreover, the present invention is directed to methods and systems for generating the forward knowledge, or indexes, employed in a distributed directory service. As indicated above, the concept of using forward knowledge to efficiently route queries in a distributed directory service was disclosed in the work documenting the WHOIS++ directory service. However, particular methods and systems for generating such forward knowledge were not developed in the prior work.
SUMMARY OF THE INVENTION
Accordingly, the present invention is directed to a method and system for operating a distributed computer system such as, e.g., a distributed directory service. The system preferably includes a plurality of index servers and a plurality of data sources. The inventive method involves generating information indicative of a mapping of the schema of each data source to the schema of its corresponding index server. The mapping information for each data source is provided to its index server. Upon receiving a query

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for rewriting queries does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for rewriting queries, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for rewriting queries will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2922224

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.