Methods and apparatus for populating a network cache

Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S130000, C711S136000, C709S224000, C709S225000, C709S212000, C709S216000

Reexamination Certificate

active

06499088

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates generally to networking technology. More specifically, the present invention relates to the caching of data objects to accelerate access to, for example, the World Wide Web. Still more specifically, the present invention provides methods and apparatus by which a network cache may be populated when initially deployed.
Generally speaking, when a client platform communicates with some remote server, whether via the Internet or an intranet, it crafts a data packet which defines a TCP connection between the two hosts, i.e., the client platform and the destination server. More specifically, the data packet has headers which include the destination IP address, the destination port, the source IP address, the source port, and the protocol type. The destination IP address might be the address of a well known World Wide Web (WWW) search engine such as, for example, Yahoo, in which case, the protocol would be TCP and the destination port would be port
80
, a well known port for http and the WWW. The source IP address would, of course, be the IP address for the client platform and the source port would be one of the TCP ports selected by the client. These five pieces of information define the TCP connection.
Given the increase of traffic on the World Wide Web and the growing bandwidth demands of ever more sophisticated multimedia content, there has been constant pressure to find more efficient ways to service data requests than opening direct TCP connections between a requesting client and the primary repository for the desired data. Interestingly, one technique for increasing the efficiency with which data requests are serviced came about as the result of the development of network firewalls in response to security concerns. In the early development of such security measures, proxy servers were employed as firewalls to protect networks and their client machines from corruption by undesirable content and unauthorized access from the outside world. Proxy servers were originally based on Unix machines because that was the prevalent technology at the time. This model was generalized with the advent of SOCKS which was essentially a daemon on a Unix machine. Software on a client platform on the network protected by the firewall was specially configured to communicate with the resident daemon which then made the connection to a destination platform at the client's request. The daemon then passed information back and forth between the client and destination platforms acting as an intermediary or “proxy”.
Not only did this model provide the desired protection for the client's network, it gave the entire network the IP address of the proxy server, therefore simplifying the problem of addressing of data packets to an increasing number of users. Moreover, because of the storage capability of the proxy server, information retrieved from remote servers could be stored rather than simply passed through to the requesting platform. This storage capability was quickly recognized as a means by which access to the World Wide Web could be accelerated. That is, by storing frequently requested data, subsequent requests for the same data could be serviced without having to retrieve the requested data from its original remote source. Currently, most Internet service providers (ISPs) accelerate access to their web sites using proxy servers.
A similar idea led to the development of network caching systems. Network caches are employed near the router of a network to accelerate access to the Internet for the client machines on the network. An example of such a system is described in commonly assigned, copending U.S. patent application Ser. No. 08/946,867 for METHOD AND APPARATUS FOR FACILITATING NETWORK DATA TRANSMISSIONS filed on Oct. 8, 1997, the entire specification of which is incorporated herein by reference for all purposes. Such a cache typically stores the data objects which are most frequently requested by the network users and which do not change too often. Network caches can provide a significant improvement in the time required to download objects to the individual machines, especially where the user group is relatively homogenous with regard to the type of content being requested. The efficiency of a particular caching system is represented by a metric called the “hit ratio” which is a ratio of the number of requests for content satisfied by the cache to the total number of requests for content made by the users of the various client machines on the network. The hit ratio of a caching system is high if its “working set”, i.e., the set of objects stored in the cache, closely resembles the content currently being requested by the user group.
Unfortunately, with currently available caching systems, the performance improvement promised by providers of such systems is not immediate due to the fact that when a cache is initially connected to a router it is unpopulated, i.e., empty. Given the size of the typical cache, e.g., >20 gigabytes, and depending upon the frequency of Internet access of a given user group, it can take several days for a cache to be populated to a level at which an improvement in access time becomes apparent. In fact, while the cache is being populated additional latency is introduced due to the detour through the cache.
From the customer's perspective, this apparent lack of results in the first few days after installing a caching system can be frustrating and often leads to the assumption that the technology is not operating correctly. To address this problem, providers of caching systems have attempted to populate the cache before bringing the system on line by using previous caching logs, i.e., “squid” logs, to develop the working set for the system. However, this presents the classic “chicken and egg” conundrum in that the first time a caching system is deployed for a particular network there are no previous caching logs for that network.
Another method of populating a caching system employs a web scavenging robot which polls the client machines on the network to determine what content has been previously requested. Unfortunately, this can be a relatively slow process which consumes network resources to an undesirable degree. This process also requires a good knowledge of what type of content the users of interest typically browse.
It is therefore apparent that there is a need for techniques by which caching systems may be quickly and transparently populated when they are initially deployed.
SUMMARY OF THE INVENTION
According to the present invention, methods and apparatus are provided by which a caching system may be populated quickly before its deployment. The techniques described herein employ a capability inherent in most routers to develop a working set of data objects which are then retrieved to populate the cache. The router to which the caching system is to be connected is configured to log information regarding the destinations from which network users are requesting information, i.e., net flow statistics. According to a specific embodiment, this information is then parsed to get a list of destinations corresponding to a specific port, e.g., port
80
, or a group of IP addresses. These destinations are then sorted according to the frequency with which they are requested. The top N destinations are then selected for populating the cache. Cacheable objects from those destinations are then retrieved and stored in the cache. The process of retrieving and storing this data takes only a few hours. Moreover, a system administrator can configure the network router to collect the necessary traffic flow data in advance of purchasing the caching system so that, once the system is delivered, it can be populated and deployed immediately.
According to another embodiment, before beginning operation as a cache, the caching system automatically configures the router to log the traffic flow data after which it analyzes the data and retrieves the appropriate data objects. Once populated it enables itself to perform t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and apparatus for populating a network cache does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods and apparatus for populating a network cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for populating a network cache will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2955126

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.