Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-03-23
2002-07-23
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06424973
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to computer database systems and more specifically to distributed computer database systems.
BACKGROUND OF THE INVENTION
The basis for communication whether it is between people or computer systems is a shared background that allows them to understand each other. This involves sharing both of the following: (1) a language for communication; and (2) a domain conceptualization that defines the shared vocabulary along with relationships that may hold between the concepts denoted by the terms in the vocabulary.
The problem of translation between different languages is important, and many computer systems have been developed for this purpose. Translation between different domain conceptualizations is also important. Translation between domain conceptualizations is called mediation. Domain conceptualizations are also called ontologies. For example, the vocabulary of Americans differs from that of the British even though they share a common language. In the UK, one would say “lift” for what is called an “elevator” in the US. Mediation would be required in order to understand what is being meant by these terms.
For a more complex example, the domain of medicine has a large vocabulary of terms for chemicals, genes, laboratory procedures, diseases, etc. Within medicine there are many subdomains that use different terminology for the same concept. Terminology can also vary from one company to another, and even small groups within a single company can have their own specialized vocabulary. Some will use the term “Munchausen Syndrome” while others prefer “Chronic factitious illness with physical symptoms”. Some might even prefer to expand the term “factitious illness” to “intentional production or feigning of symptoms or disabilities, either physical or psychological” to make it understandable to someone with minimal medical background.
The problem of mediation between domain conceptualizations is especially difficult for computer systems because they generally have no mechanism for dealing with miscommunication as a result of misunderstood terminology. For example, conventional search engines simply match words in a query with words in documents. Some search engines consider the possibility of synonymous words, but the fact that the words might belong to different domains is not considered.
For example, suppose that one wishes to find occurrences of “Job” in the Bible. Job is one of the persons mentioned in the Bible, and one of the books in the Bible is named after him. However, modern search engines do not generally understand this, and they will make errors such as matching “Job” with “work” because they regard these two words as synonymous.
Current search engines support only a very limited ontology with just a few concepts. Moreover, the ontology is inflexibly built into the search engine and only one ontology is supported. In general, indexes of current database systems are thus limited to a single ontology.
A collection of documents, data or other kinds of information objects will be called an object database. Information objects can be images, sound and video streams, as well as data objects such as text files and structured documents. Each information object is identified uniquely by an object identifier (OID). An OID can be an Internet Universal Resource Locator (URL) or some other form of identifier such as a local object identifier.
To assist in finding information in an object database, special search structures are employed called indexes. Current technology generally requires a separate index for each attribute or feature. Even the most sophisticated indexes currently available are limited to a very small number of attributes. Since each index can be as large as the database itself, this technology does not function well when there are hundreds or thousands of attributes, as is often the case when objects such as images, sound and video streams are directly indexed. Furthermore, there is considerable overhead associated with maintaining each index structure. This limits the number of attributes that can be indexed. Current systems are unable to scale up to support databases for which there are: many object types, including images, sound and video streams; millions of features; queries that involve many object types and features simultaneously; and new object types and features being continually added.
Further information can be had regarding some of the concepts discussed herein with reference to the following publications:
1 L. Aiello, J. Doyle, and S. Shapiro, editors.
Proc. Fifth Intern. Conf. on Principles of Knowledge Representation and Reasoning
. Morgan Kaufman Publishers, San Mateo, Calif., 1996.
2 K. Baclawski. Distributed computer database system and method, December 1997. U.S. Pat. No. 5,694,593. Assigned to Northeastern University, Boston, Mass.
3 K. Baclawski and D. Simovici. An abstract model for semantically rich information retrieval. Technical report, Northestern University, Boston, Mass., March 1994.
4 A. Campbell and S. Shapiro. Algorithms for ontological mediation. Technical report, State University of New York at Buffalo, Buffalo, N.Y., 1998.
5 A. Del Bimbo, editor.
The Ninth International Conference on Image Analysis and Processing
, volume 1311. Springer, September 1997.
6 N. Fridman Noy.
Knowledge Representation for Intelligent Information Retrieval in Experimental Sciences
. PhD thesis, College of Computer Science, Northeastern University, Boston, Mass., 1997.
7 R. Jain. Content-centric computing in visual systems. In
The Ninth International Conference on Image Analysis and Processing, Volume II, pages
1-13, September 1997.
8 Y. Ohta.
Knowledge-Based Interpretation of Outdoor Natural Color Scenes
. Pitman, Boston, Mass., 1985.
9 G. Salton.
Automatic Text Processing
. Addison-Wesley, Reading, Mass., 1989.
10 G. Salton, J. Allen, and C. Buckley. Automatic structuring and retrieval of large text files.
Comm. ACM,
37(2):9-108, February 1994.
11 A. Tversky. Features of similarity.
Psychological review,
84(4):327-352, July 1977.
The disclosures of the publications referenced in this “Background of the Invention” are incorporated herein by reference.
It would be desirable to provide an information retrieval system that can retrieve information from a database, including documents, images and other forms of multimedia, talking into account ontologies and using a single indexing system, and otherwise overcome many disadvantages and limitations of current systems.
SUMMARY OF THE INVENTION
The invention resides in performing, preferably in parallel over a distributed network of computer nodes, ontology mediation and information retrieval in response to a user query in order to retrieve information objects conforming to target ontologies specified in the query.
Briefly, the invention can provide an information retrieval system for processing a query for word based and non-word based retrieval of information from a database by extracting a number of features from the query according to its ontology, fragmenting each of the features into feature fragments, and hashing each of the feature fragments into hashed feature fragments. The hashed feature fragments can be used in accessing a hash table for obtaining object identifiers therefrom that can be used for obtaining information from the database relevant to the query and to its target ontologies.
In another aspect, the invention resides in an information indexing system for indexing information for facilitated retrieval from a database, by extracting a number of features from the information, fragmenting each of the features into feature fragments, and hashing each of the feature fragments into hashed feature fragments. The hashed feature fragments are used in accessing a hash table for storing object identifiers at locations determined by the hashed feature fragments and the ontology identifiers. The information retrieval apparatus can be implemented in a distributed computer database system.
In general, the term “feature” as used herein means any informat
Amsbury Wayne
Jarg Corporation
Kudirka & Jobse LLP
LandOfFree
Search system and method based on multiple ontologies does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Search system and method based on multiple ontologies, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Search system and method based on multiple ontologies will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2907879