Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-06-19
2004-01-27
Metjahic, Safet (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06684204
ABSTRACT:
FIELD OF INVENTION
The present invention relates generally to searching a network and particularly to searching a network which includes documents that have a plurality of tags.
BACKGROUND OF THE INVENTION
Computer networking systems such as the Internet are exploding in popularity all over the world. There are many reasons for this phenomenal growth, not the least of which is the ability to discover and access needed information in an efficient manner. The power of the Internet enables the average person with very little technical training to search for information in minutes instead of days, weeks, or even months of searching libraries, telephone books, directories or other conventional research means. To better understand conventional Internet search technology, refer now to FIG.
1
.
FIG. 1
represents a flowchart of how an Internet user performs a conventional web search.
First, the Internet user accesses a web search engine, via step
10
. Next, the Internet user enters a search term(s) into the web search engine, via step
12
. The web search engine then identifies the web pages that contain the search term(s), via step
14
. Finally, the web pages containing the search term(s) are listed by the search engine, via step
16
.
However, as more and more information comes online, at accelerating rates, today's search engine interfaces and features are not keeping pace. Searches that would have previously produced less than a dozen relevant documents are now producing hundreds of documents. This is making it very difficult and time consuming for the Internet user to evaluate and investigate the results. More sophisticated searches, sometimes beyond the grasp of a non-professional researcher, are not always the answer as the narrower searches introduce greater risk of eliminating relevant and useful information. The severity of this problem is growing day by day at an ever-increasing rate.
One of the circumstances greatly exacerbating this problem is the tendency of web page developers to add large numbers of keywords to each and every page of their web site as a strategy to boost their standings with the Internet search engines. Thus, a single web site, which an Internet user may decide is not relevant after accessing the web site home page, may produce dozens or even hundreds of result pages listed in the search results.
FIG. 2
shows a typical web search results list. The search term(s)
20
appears on multiple web pages of the “www.pinemountainlake.com”
22
and “www.pmlr.com”
24
web sites. Even with enhanced bandwidth and greater network speeds, wading through hundreds of these “hits” to move to the next interesting web site is inefficient, cumbersome and annoying. An Internet user may actually lose patience after viewing dozens of pages of results with redundant information and terminate his search prematurely missing the relevant page buried deep down in the list.
However, the Standard Generalized Markup Language (SGML) working group of the W
3
Consortium has proposed a new standard, called XML (extensible Markup Language) which is a subset of SGML. The goal of XML is to provide many of SGML's benefits that are currently not available with current HTML (Hypertext Markup Language).
One of XML's benefits is its simplicity.
FIG. 3
shows a typical XML document. An XML document is a sequence of tags. Data along with the associated tag is referred to as an element. For example, a book has a title, an author, a publisher, and a price.
FIG. 4
accordingly illustrates the tag structure associated with a book entitled “Presenting XML”.
The only restriction is that tag elements must match, e.g. each <ADDRESS> must have a matching </ADDRESS>, and must nest properly. An XML Document that has matching and properly nested tags is called well-formed. The elements in XML loosely correspond to objects in object oriented or object-relational databases. For example, a <PERSON> . . .</PERSON> would correspond to an object of type class PERSON{. . . }. Nested XML elements correspond to an object's fields, e.g., <NAME>, <PHONE> and <ADDRESS> elements in <PERSON> would correspond to the name, phone, and address fields of a PERSON object.
This simplicity allows users to produce XML data with complex structure without having to first define a schema. It can be useful, however, to have some specification of XML data's structure, especially for a user community to define its own ontology for data exchange. In this case DTDs (Document Type Definitions) can be used to specify the data's known structure.
FIG. 5
shows a typical DTD schema. While DTDs are similar to schemas in object-oriented or object-relational databases, they are less restrictive and permit more variation in the data. For example, DTDs can specify that some fields are optional and that others may occur multiple times, and DTDs do not require that the type of a reference be specified.
Given its flexibility, it is likely that XML will facilitate the exchange of huge amounts of data on the Web. Dozens of application of XML already exist, including a Chemical Markup Language for exchanging data about molecules and the Open Financial Exchange for exchanging financial data between banks or banks and customers. Based on the availability of huge amounts of XML data, one is faced with a problem when the need arises to extract data from these documents. The problem is that conventional search engines, although equipped to search HTML documents, are not able to effectively search XML documents. This is due to the fact that conventional search engines aren't equipped to handle documents comprising the element tags that the XML format utilizes.
Accordingly, what is needed is an effective method for searching XML documents. The method should be simple, cost effective and capable of being easily adapted into existing technology. The present invention addresses such a need.
SUMMARY OF THE INVENTION
A method and system for conducting a search on a network is disclosed. The network has a plurality of sites. One or more of the sites has a plurality of documents wherein at least one of the documents comprises a plurality of tags. The method and system comprises identifying at least one of the plurality of tags, receiving a query, parsing the query, and matching the parsed query with at least one of the plurality of tags of the at least one of the plurality of documents.
Accordingly, through the use of a method and system in accordance with the present invention, the extraction of information from networks comprising XML documents is done in a more precise fashion.
REFERENCES:
patent: 5920854 (1999-07-01), Kirsch et al.
patent: 6012098 (2000-01-01), Bayeh et al.
patent: 6088675 (2000-07-01), MacKenty et al.
patent: 6263332 (2001-07-01), Nasr et al.
patent: 6266682 (2001-07-01), LaMarca et al.
patent: 6292880 (2001-09-01), Mattis et al.
patent: 6360215 (2002-03-01), Judd et al.
patent: 6366934 (2002-04-01), Cheng et al.
patent: 6377946 (2002-04-01), Okamoto et al.
patent: 6385583 (2002-05-01), Ladd et al.
patent: 6505191 (2003-01-01), Baclawski
Liu, et al., “Am XML-based Wrapper Generator for Web Information Extraction,” Oregon Graduate Institute of Science and Technology, 1999, pp. 540-543.
McHugh, et al., “Lore: A Database Management System for Semistructured Data,” Stanford University.
Shin, et al., “BUS: An Effective Indexing and Retrieval Scheme in Structured Documents,” Department of Computer Engineering, Chungnam National University, Korea, 1998.
“GMD-IPSI XQL Engine, Version 1.0.2,” GMD, 1999.
“XML-QL: A Query Language for XML,” World Wide Web Consortium, Aug. 19, 1998.
Nguyen Cam Linh
Sawyer Law Group LLP
LandOfFree
Method for conducting a search on a network which includes... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for conducting a search on a network which includes..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for conducting a search on a network which includes... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3192954