Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-07-23
2003-01-07
Corrielus, Jean M. (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C002S005000, C002S010000, C002S004000
Reexamination Certificate
active
06505191
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to computer database systems and more specifically to distributed computer database systems.
BACKGROUND OF THE INVENTION
The World Wide Web (WWW) is much more than just a collection of Web pages. Each page contains references to other pages. Such references are called links, and one of the most important features of a Web browser is the ability to follow a link and display the page that is being referenced. A collection of documents linked together in this way is called a hypertext.
The link structure of a hypertext is a rich source of knowledge about the content of the hypertext. In the field of bibliometrics, links in the form of citations have been used for understanding documents by using citation analysis techniques. The link structure of the WWW is now being exploited as a means of categorization and knowledge extraction. This is being done in two ways:
1. General hypertext query languages.
2. Cluster analysis algorithms.
A Web query language, such as WebSQL, is a query language for extracting information from the Web, based on hypertext structure as well as content. For example, one might be interested in a job opportunity for a librarian. One can query the Web using WebSQL to find all pages containing the keywords “employment” or “job opportunities” and then list all the pages referenced by such a page and containing the keyword “librarian.”
Cluster analysis algorithms make use of Web query languages to find specific patterns in the link structure of the WWW. The most common cluster analysis pattern is the authority/hub pattern. To compute this pattern, one first specifies a topic area using one or more keywords. For example, one might be interested in the topic “knowledge management”. A page is potentially relevant if it contains one or more keywords of the topic. An authority page for a topic is a page that is referenced by a large number of pages potentially relevant to the topic. Note that an authority page need not contain any of the keywords of the topic. Authority is conferred on it by virtue of being referenced frequently by potentially relevant pages. A hub page for a topic is one that references a large number of pages potentially relevant to the topic. An authority page for knowledge management is one that is highly referenced by pages that mention knowledge management. If one is interested in knowledge management, then it seems natural to look first at the authority pages.
Web query languages in general, and Web cluster analysis algorithms in particular, are limited in an important respect. They can only evaluate outgoing links, not incoming links. This is due to the way that Web links are defined. A link within one page specifies the page to which it linked, not the other way around. For example, suppose that one was interested in all the pages that refer to one's own home page. WebSQL cannot answer such a query.
The WWW is not just a hypertext. Pages can contain images, sound and video streams, and the structure of the WWW is continually changing. For these reasons, the WWW is called a hypermedia environment. Web resources are located by a Universal Resource Locator (URL) which uniquely identifies the resource. More generally, a hypermedia environment consists of information objects that are uniquely identified by an object identifier (OID) and that can contain links to other information objects. A hypermedia environment is also called an object database.
To assist in finding information in an object database, special search structures are employed called indexes. Large databases require correspondingly large index structures to maintain pointers to the stored data. Such an index structure can be larger than the database itself. Current technology requires a separate index for each attribute or feature. This technology can be extended to allow for indexing a small number of attributes or features in a single index structure, but this technology does not function well when there are hundreds or thousands of attributes. Furthermore, there is considerable overhead associated with maintaining an index structure. This limits the number of attributes or features that can be indexed. Current systems are unable to scale up to support databases for which there are: many object types; millions of features; queries that involve many object types and features simultaneously; and new object types and features being continually added.
Further information can be had regarding the foregoing concepts with reference to the following publications:
1 L. Aiello, J. Doyle, and S. Shapiro, editors.
Proc. Fifth Intern. Conf. on Principles of Knowledge Representation and Reasoning.
Morgan Kaufman Publishers, San Mateo, Calif., 1996.
2 G. Arocena, A. Mendeizon, and G. Mihaila. Applications of a web query language. In
Proc.
6
Intern. World Wide Web Conf.,
1997.
3 K. Baclawski. Distributed computer database system and method, December 1997. U.S. Pat. No. 5,694,593. Assigned to Northeastern University, Boston, Mass.
4 S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource list compilation by analyzing hyperlink structure and associated text. In
Proc.
7
Intern. World Wide Web Conf.,
1998.
5 A. Del Bimbo, editor.
The Ninth International Conference on Image Analysis and Processing
, volume 1311. Springer, September 1997.
6 N. Fridman Noy.
Knowledge Representation for Intelligent Information Retrieval in Experimental Sciences.
PhD thesis, College of Computer Science, Northeastern University, Boston, Mass., 1997.
7 D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web communities from link topology. In
Proc.
9
ACM Conf. on Hypertext and Hypermedia,
1998.
8 R. Jain. Content-centric computing in visual systems. In
The Ninth International Conference on Image Analysis and Processing, Volume II
, pages 1-13, September 1997.
9 J. Kleinberg. Authoritative sources in a hyperlinked environment. In
Proc. ACM
-
SIAM Sympos. on Discrete Algorithms,
1998.
10 Y. Ohta.
Knowledge
-
Based Interpretation of Outdoor Natural Color Scenes.
Pitman, Boston, Mass., 1985.
11 P. Pirolli, J. Pitkow, and R. Rao. Silk from a sow's ear: Extracting usable structures from the web. In
CHI'
96
Proceedings: Conference on Human Factors in Computing Systems: Common Ground,
pages 118-125, Vancouver, BC, 1996.
12 E. Riviin, R. Botafogo, and B. Schneiderman. Navigating in hyperspace: Designing a structure-based toolbox.
Comm. of the ACM,
37(2):87-96, February 1994.
13 G. Salton.
Automatic Text Processing.
Addison-Wesley, Reading, Mass., 1989.
14 G. Salton, J. Allen, and C. Buckley. Automatic structuring and retrieval of large text files.
Comm. ACM,
37(2):97-108, February 1994.
15 E. Spertus. ParaSite: Mining structural information on the web. In
Proc.
6
Intern. World Wide Web Conf.,
1997.
16 A. Tversky. Features of similarity.
Psychological review,
84(4):327-352, July 1977.
17 R. Weiss, B. Velez, M. Sheldon, C. Nemprempre, P. Szilagyi, and C. Giffor. HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In
Proc. Seventh ACM Conf. on Hypertext,
pages 180-193, 1996.
18 H. White and K. McCain. Bibliometrics.
Ann. Rev. Info. Sci. and Technology,
pages 119-186, 1989.
The disclosures of the publications referenced in this “Background of the Invention” are incorporated herein by reference.
It would be desirable to provide an information retrieval system that can retrieve link and other information from a unified database of word and non-word based information, including documents, images and other forms of multimedia, using a single indexing system, and otherwise overcome many of the performance and other problems and limitations of current systems. Such information retrieval systems preferably would be highly scalable, versatile, robust and economical.
SUMMARY OF THE INVENTION
The present invention resides in an indexing and search engine for extraction of information based on the content of information objects in a database as well as links be
Corrielus Jean M.
Jarg Corporation
Kudirka & Jobse LLP
LandOfFree
Distributed computer database system and method employing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Distributed computer database system and method employing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distributed computer database system and method employing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3004250