Method and apparatus for creating an index in a database system

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06240407

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to a method and apparatus for creating an index in a database system for efficient execution of structured queries.
2. Description of the Related Art
HyperText Markup Language (HTML) has been the standard format for delivering information on the World Wide Web (WWW). However, HTML has only a limited set of tags for specifying document structures, and these tags are mainly for the purposes of browser presentation. Automated information processing on these documents for data exchange and interoperability has been difficult. Extensible Markup Language (XML), which is a subset of Standard Generalized Markup Language (SGML), has been proposed to be the next standard format that allows user-defined tags for better describing nested document structures and associated semantics.
People are still learning how to use effectively the flood of information available on the Internet and intranets. Thanks to current search engines, queries may yield very extensive results that may contain the needed information from sites all over the world. There are several main functions in a search engine: information gathering, indexing, categorization, and searching. Information gathering usually uses Web crawlers to send visited pages to the index engine. The index mechanism normally uses some form of inverted files and, when given a word, returns a list of references that contain the word. Categorization, or clustering, tries to categorize the pages according to some attributes, such as topics. The searching function allows the user to ask content-based queries and get ranked result sets.
While HTML documents serve very well for Web browsing, automated information processing on them could be difficult, because there are few semantics associated with the documents. For example, without human understanding or a sophisticated program, it is difficult to know what a number “1991” means in an HTML document; it could be a year, a quantity, or anything. Just as in a programming language, program semantics are defined by a standardized set of keywords. HTML has a limited set of keywords (i.e., tags) and they are mainly for presentation purposes, not for semantics associated with document contents.
To be able to automate Web information processing and in particular for data exchange and interoperability, XML has been proposed to the World Wide Web Consortium (W3C) as a new markup language that supports user-defined tags, and encourages the separation of document contents from presentation. XML is a meta language that allows the user to define a language for composing structured documents. With XML, the user can define any desired tags for better structuring of documents (although adding misleading tags is also possible). For interoperability, domain-specific tags, called vocabulary, can be standardized, so that applications in that domain understand the meaning of the tags. Various vocabularies for different domains have been proposed in the SGML community, such as Electronic Data Interchange (EDI) for banking exchange, Standard Music Description Language (SMDL) for music, or Chemical Markup Language (CML) for chemical. Recently, vocabularies have been proposed in the XML community, for example the Channel Definition Format (CDF) for channel.
Structured documents refer to documents that can have nested structures. Assuming structured documents will be abundant, in particular within intranets and extranets (between businesses), where documents are more likely to be regularly structured, there is clearly a need for a search engine that understands document structures and allows a user to ask structured queries. Current search engines either flatten out the structure of a document (i.e., remove nested structures), or have limited, predefined structures (such as paragraphs and sentences, according to some predefined punctuation marks), and thus are not capable of evaluating general ad hoc structured queries. Structured documents also enable comparisons among numeric values, for example, to get the references published after year 1991 from a structured paper (which is not possible with an inverted file based search engine).
A successful search engine for a large repository of structured documents relies on good indexing schemes. Therefore, there is a need in the art for designing indexes that support structured queries and execute the queries without resorting to the structured documents.
SUMMARY OF THE INVENTION
To overcome the limitations in the prior art described above, and to solve various problems that will become apparent upon reading and understanding of the present specification, it is one object of the present invention to provide a method, apparatus and article of manufacture for computer-implemented creation of an index in a database system to provide efficient execution of structured queries.
In accordance with the present invention, a general framework is disclosed for manipulating structured documents based on document abstractions. This general framework is applied to the area of indexing and searching structured documents, but it is to be understood that it can be applied to other functions, such as document summarization or categorization.
In order to handle any structured query, an index created by the general framework must possess some form of document structures, sufficient to enable it to evaluate the query without resorting to document sources. To create the structured index, a structured document, interactively entered by an operator or already stored in the database system, is parsed into at least one corresponding element, then abstracted using a predefined abstraction procedure to obtain a set of abstracted values, the set of abstracted values being stored in the index for efficient execution of structured queries.
An object of the present invention is to apply the abstract interpretation technique to structured documents. This is accomplished through the process of abstracting the elements of a previously parsed document.
Another object of the invention is to provide a framework, based on document abstractions, that generalizes many existing techniques and provides a systematic approach to experimenting with the tradeoff between cost and capability for application on structured documents.
Yet another object of the invention is to apply the framework to the functions of indexing and searching structured documents, and to describe the design space in structural and content abstractions.


REFERENCES:
patent: 3947825 (1976-03-01), Cassada
patent: 4358824 (1982-11-01), Glickman et al.
patent: 5056021 (1991-10-01), Ausborn
patent: 5159647 (1992-10-01), Burt
patent: 5721897 (1998-02-01), Rubinstein
patent: 5893104 (1999-04-01), Srinivasan et al.
patent: 6128610 (2000-10-01), Srinivasan et al.
patent: 1 110 814 (1994-04-01), None
patent: 609 996 (1994-01-01), None
patent: 09 034 906 (1995-07-01), None
Wu et al. “Multi View intermediate representation based on algebraic data type” (IEEE publication) paper from High performance computing in asia-pacific 2000, pp. 263-264, vol., May 2000.*
Bouhoula, A. “Simultaneous checking of completeness and ground confluence” (IEEE publication) paper in Automated software engineering, 2000, pp. 143-151, Sep. 2000.*
IBM Technical Disclosure Bulletin, vol. 40, No. 10, Oct., 1997, “Export in Self-Defining Archive-Compatible Format From Remote Image Capture.”
Jason McHugh et al., “Lore: A Database Management System for Semistructured Data,” Stanford University; dated unknown.
Mariano P. Consens et al., “Optimizing Queries on Files,” Proceedings of the 1994 ACM SIGMOD Conference.
L.J. Brown et al., “A Structured Text ADT for Object-Relational Databses,” University of Waterloo, Waterloo, Ontario, Canada, Jul. 1997.
Peter Buneman et al., “A Query Language and Optimization Techniques for Unstructured Data,” Montreal, Canada, SIGMOD '96, Jun., 1996.
Kwagnkeun Yi et al., “Automatic Generation Management of Interpro

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for creating an index in a database system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for creating an index in a database system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for creating an index in a database system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2449726

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.