Web page connectivity server construction

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C709S201000, C709S218000, C709S224000

Reexamination Certificate

active

06701317

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to techniques for collecting, arranging, and coordinating information pertaining to the connectivity of Web pages and, more particularly, to the construction of a connectivity server, including a data structure incorporating a URL Database, a Host Database and a Link Database, the connectivity server for facilitating efficient and effective representation and navigation of Web pages.
2. Description of the Related Art
The World Wide Web (Web) is constituted from the entire set of interlinked hypertext documents that reside on Hypertext Transfer Protocol (HTTP) servers that are globally connected by Internet. Documents resident on the Web (Web pages) are generally written in a mark-up language such as HTML (Hypertext Markup Language) and are identified by URLs (Uniform Resource Locators). In general, URLs correspond to addresses of Internet resources and serve to specify the protocol to be used in accessing a resource, as well as the particular server and pathname by which the resource may be accessed.
Files are transmitted from a Web server to an end user under HTTP. Codes, called tags, that are embedded in an HTML document associate particular words and images in the document with URLs, so that an end user can access other Web resources, regardless where physically located, upon the activation of a key or mouse.
Users of client computers use Web browsers to locate Web pages that, as indicated above, are identified by URLs. Specialized servers, called search engines, maintain indices of the contents of Web pages. The browsers may be used to pose textual queries. In response, the search engines return result sets of URLs that identify Web pages that satisfy the queries. Usually, the result sets are rank ordered according to relevance.
In this regard, information related to the connectivity of Web pages, such as the number of links to or from a page, can be used as a tie-breaking mechanism in ranking the result sets or as an input in deciding the relative importance of result pages.
The URL names of the result sets may then be used to retrieve the identified Web pages, a s well as other pages connected by “hot links.”
However, many users are interested in more than merely the content of the Web pages. Specifically, users may be interested in the manner in which Web pages are interconnected. In other words, users may be interested in exploring the connectivity information embedded within the Web for practical, commercial, or other reasons.
The connectivity information provided by search engines exists largely as a byproduct of their paramount function. Although an unsophisticated user may easily follow a trail between connected Web pages, the extraction of global view of connectivity quickly becomes tedious. The connectivity representation in the search engines serves a single purpose: to provide answers to queries. However, determination of all pages that are, for example, two links removed from a particular page may require thousands of queries, and a substantial amount of processing by the user. Without a separate representation of the Web, it is very difficult to provide linkage information. In fact, most search engines fail to provide access to any type of connectivity information.
This is a significant drawback, because linkage information between Web pages is a valuable resource for Web visualization and page ranking. Several ongoing research projects use such information. Most connectivity information is obtained from ad-hoc Web “crawlers” that build relatively small databases of local linkage information.
A database may be constructed on the fly or statically. When constructed on the fly, each new page is parsed as it is accessed in order to identify links. The linked neighboring pages are retrieved until the required connectivity information is gathered. When statically constructed, a connectivity database is essentially rebuilt from scratch whenever updates are required. For example, the service designated Linalert™ provided by Lycos uses static databases specifically designed to offer linkage information for particular Web sites. Earlier implementations of both on-the-fly and static approaches have proven inefficient and clumsy to use, and do not comprehend to the entire Web and a large number of clients. Consequently, prior-art implementations of connectivity databases generally perform poorly and/or are limited in scope.
Accordingly, U.S. Pat. No. 6,073,135, entitled “Connectivity Server for Locating Linkage Information Between Web Pages,” hereby incorporated by reference, is directed to a server that enables convenient and efficient representation and navigation of connectivity information of Web pages. The server described therein (hereinafter “CS
1
”) maintains accurate linkage information for a significant portion of the Web and supports a large number of client users that desire numerous variants of connectivity information. In addition, the system dynamically updates the connectivity information so that the linkage information is current.
FIGS. 1 through 9
of the Drawings depict the implementation of CS
1
in accordance with U.S. Pat. No. 6,073,135.
As depicted in
FIG. 1
, the Web is shown to comprise a widely distributed network of computers
100
that include numerous client computers
110
connected to server computers
120
by a network
130
. Generally, servers
120
provide information, products, and services to users of the clients
110
.
Client computers
110
may be personal computers (PCs), workstations, or laptops. Typically, clients are equipped with input/output devices
115
, such as a keyboard, mouse, and display device
115
. Software in the form of a Web browser
111
interacts with devices
115
to provide an interface between the user and the Web.
The server computers
120
are usually larger computer systems, although this does not always need to be so. Some of the servers, also known as “Web sites,” maintain a database (DB)
121
of Web pages
122
. Each Web page
122
is identified and can be located by its URL
123
. Web pages are usually formatted using HTML, which establishes links to other pages. A user is afforded the opportunity to “click” on a link within a page viewed with the browser in order to access a “pointed to” page.
Search engines, in the form of servers
140
, maintain an index
141
of the contents of Web pages. Using a search engine application programming interface (API)
142
, client users may locate pages having specific content of interest to the users. The user specifies pages of interest to the API of the search engine
140
by composing queries that are processed by the search engine's API
142
.
A specialized, “connectivity” server
150
is also provided. Connectivity server
150
maintains a connectivity database
151
. Using a connectivity server API
152
, users may locate pages (URLs) according to the definition of the interconnection between pages.
As shown in
FIG. 2
, a graph
200
is built to represent the connectivity of Web pages. In the graph
200
, each node (A, . . . , G)
210
represents a Web page
122
. Each edge, for example an edge (AB)
220
represent a link from one page to another, for example, with edge AB representing a link from page A to page B. The connectivity API
152
, in various forms, enables client users to “explore” or navigate” graph
200
to extract connectivity information.
It is readily appreciated that the data representation of graph
200
in memory must be carefully designed to minimize memory storage requirements. Assuming the graph contains approximately 100 M Web pages with an average outdegree of seven, then the graph will have about 700 M edges. A rudimentary implementation would store two pointers per edge. Furthermore, given that the average size of a URL is about 80 bytes, the uncompressed URLs of the nodes depicted in the rudimentary a implementation will occupy about 8 Gb (Gigabytes). From another perspective, storage of 1 B (uncompressed) edges will similarly require 8 Gb of stora

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Web page connectivity server construction does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Web page connectivity server construction, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Web page connectivity server construction will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3228546

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.