System and method employing random walks for mining web page...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details System and method employing random walks for mining web page... System and method employing random walks for mining web page...

: 2000-11-09
: 2003-04-15
: Choules, Jack (Department: 2177)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: C707S793000
: Reexamination Certificate
: active
: 06549896
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates, generally, to content delivery networks and, in preferred embodiments, to systems and methods employing random walks for mining web page associations and usage (data mining), and to optimize user-oriented web page refresh and pre-fetch scheduling.
2. Description of the Related Art
Web performance is a key point of differentiation among content providers. Snafus and slowdowns with major Web sites demonstrate the difficulties companies face when trying to scale large Web traffic. As Internet backbone technologies develop, many innovations, such as quality of service management, have been used to improve network bandwidth and improve Web content retrieval time. These improvements to infrastructure, however, cannot solve traffic problems occurring at any one point in the Internet. For example, in
FIG. 1
, an end-user
10
in a network
12
in Japan wants to access a page in a content provider original Web site
14
in a network
16
in the U.S. The request will pass through several Internet Service Provider (ISP) gateways
18
,
20
, and
22
before it reaches the content provider original Web site
14
. Because of gateway bottlenecks and other delay factors along the Internet paths between the end-user and the content provider original Web site
14
, a content pre-fetching and refreshing methodology utilizing a proxy server on the end-user side of the gateways could provide faster response time.
FIG. 2
illustrates a typical Web content delivery and caching scheme
24
which includes a caching system
26
connected to multiple non-specific Web sites
28
and
30
. The caching system
26
is comprised of a proxy server or cache server
32
, and cache
34
. It should be understood that the cache
34
may be proxy cache, edge cache, front end cache, reverse cache, and the like. Alternatively, the caching system
26
of
FIG. 2
can be replaced by a content delivery services provider and mirror sites, which would be connected to Web sites that have entered into subscriber contracts with the content delivery services provider. These subscriber Web sites will deliver content to the content delivery services provider for mirroring, but will not necessarily notify the content delivery services provider when the content has changed.
In
FIG. 2
, when content is delivered from a Web site to cache
34
, a header called a meta-description or meta-data is delivered along with the content. The meta-data may be a subset of the content, or it may indicate certain properties of the content. For example, the meta-data may contain a last-modified date, an estimate that the content will expire at a certain time, and an indication that the content is to expire immediately, or is not to be cached. After the content and meta-data are delivered, if storing the content in cache
34
is indicated by the meta-data, the content will be stored in cache
34
.
When a user
36
(user
1
) requests access to a page (e.g., index.html) from a Web site
28
(Web site
1
), the Web browser of user
1
will first send a request to a domain name server (DNS) to find the Internet Protocol (IP) address corresponding to the domain name of Web site
1
. If, as in the example of
FIG. 2
, a caching system
26
is employed, the Web browser may be directed to the proxy server
32
rather than Web site
1
. The proxy server
32
will then determine if the requested content is in cache
34
.
However, even though the requested content may be found in cache
34
, it must be determined whether the content in cache
34
is fresh. This problem can be described as database synchronization. In other words, it is desirable for the cache
34
and Web site
1
to have content that is the same. As described above, however, subscriber Web sites may not notify the proxy server
32
when their content has changed. Thus, the proxy server
32
may examine the meta-data associated with the requested content stored in cache
34
to assist in determining if the content is fresh.
If the requested content is found in the cache
34
and the meta-data indicates that the estimated time for expiration has not yet occurred, some caching systems will simply deliver the content directly to user
1
. However, more sophisticated caching systems may send a request to Web site
1
for information on when the desired content was last updated. If the content was updated since the last refresh into cache
34
, the content currently in the cache
34
is outdated, and fresh content will be delivered into the cache
34
from Web site
1
before it is delivered to user
1
. It should be understood, however, that this process of checking Web sites to determine if the content has changed will also increase bandwidth or system resource utilization.
Similarly, if the requested content is found in the cache
34
but the content was set to expire immediately, some caching systems will simply fetch the content from Web site
1
and deliver it to user
1
. However, if the end-user requests a validation of data freshness, some caching systems may send a request to Web site
1
for information on when the desired content was last updated. If the content was last updated prior to the last refresh into cache
34
, the content is still fresh and the caching system will deliver the content to user
1
, notwithstanding the “expired immediately” status of the content.
If the requested content is not in the cache
34
, the proxy server
32
will send the request to Web site
1
to fetch the text of the desired Web page (e.g., index.html). After user
1
's Web browser receives index.html, the browser will parse the html page and may issue additional requests to Web site
1
to fetch any embedded objects such as images or icons. However, if a caching system
26
is employed, the proxy server
32
will first determine if the embedded objects are available in the cache
34
. All traffic (i.e., data flow) is recorded in a log file
38
in the proxy server
32
. The log file
38
may include the IP addresses of the location from which requests are issued, the URLs of objects fetched, the time stamp of each action, and the like. Note that a proxy server
32
is usually shared by many end-users so that the content in the cache
34
can be accessed by end-users with similar interests. That is, if user
1
accesses a page and the page is stored in the cache
34
, when another user
40
(user
2
) requests the same page, the proxy server
32
can simply provide the content in the cache
34
to user
2
.
In some caching systems a refresh may be performed even when there is no end user request for content. Without any user request being received, the cache will send a request to the Web site that delivered content into the cache to determine when the content in the Web site was last updated. If the content has changed, the content will be refreshed from the Web site back into cache. Thus, when a request for content is received from an end user, it is more likely that the content in cache will be fresh and transmitted directly back to the end user without further delay.
Network bandwidth resources and system resources are important for end users and proxy servers connected to the Internet. The end users and proxy servers can be considered to be “competing” with each other for bandwidth and connections resources, although their goals are the same—to provide users with the fastest response time.
FIG. 3
illustrates the connections available for a typical proxy server
42
. The fastest response time for an individual request can be achieved when the requested content is located in the proxy server cache and is fresh, so that the proxy server
42
does not need to fetch the content from the Web site through the Internet. This situation is known as a cache “hit.” System-wide, the fastest response times are achieved with a very high cache hit ratio. Thus, it would seem clear that more pre-fetching
44
, refreshing, and pre-validation will lead to more fresh content, a higher cache hit ratio, and faster response

Affiliated with

Candan Kasim Selcuk

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Li Wen-Syan

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Choules Jack

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Foley & Lardner

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

NEC USA Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method employing random walks for mining web page... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method employing random walks for mining web page..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method employing random walks for mining web page... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3019708

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure