Electrical computers and digital processing systems: multicomput – Computer-to-computer protocol implementing – Computer-to-computer data transfer regulating
Reexamination Certificate
2000-04-19
2004-01-20
El-Hady, Nabil (Department: 2154)
Electrical computers and digital processing systems: multicomput
Computer-to-computer protocol implementing
Computer-to-computer data transfer regulating
C709S230000, C709S231000, C709S233000, C709S234000, C709S235000
Reexamination Certificate
active
06681255
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed to spider engines and, in particular, to regulating the rate of data retrieval by a spider engine.
2. Related Art
“Web crawlers”, “robots”, or “spider engines” are programs used to automatically search the Internet for web pages or documents of interest. The information found by the spider engine may be collected, cataloged, and otherwise used by search engines. For example, a spider engine may be directed to search for and collect particular types of data, such as product catalog information, or may randomly search and catalog all found web pages to create a web index. The spider engine may enter a particular web site, and search one or more web pages of the web site for information of interest. The web site being searched may maintain a large number of web pages. Hence, searching with a spider engine may entail downloading, via the Internet, hundreds, thousands, and even more pages of information in a relatively short amount of time, from a single web site server.
Searching a web site in this manner with a spider engine may cause a web site server to become heavily loaded with web page requests. A web site server may be physically limited to supporting a particular amount of web page requests at any one time. The loading due to requests from a single spider engine may approach this web page request limit, and impair the web server's ability to respond to other requests for information during this period. This overloading may be detrimental to the web site provider's goal of making information available to interested parties, and may discourage interested parties from visiting the web site because they receive denials of service. Hence, what is needed is a method and system for limiting such web site requests of a web server by a spider engine, while still yielding acceptable search results.
SUMMARY OF THE INVENTION
The present invention prevents a spider engine from overloading a web site with web page requests. The present invention includes a timing module that is coupled to the spider engine. The timing module of the present invention prevents the overloading of a web site server. The timing module monitors data transfer between the web site server and the spider engine, and provides the spider engine with information to adjust the data transfer rate accordingly. The timing module can insert a “wait” state of a calculated length of time between data requests by the spider engine. By controlling this wait time inserted between data requests, the timing module is able to adjust the overall data transfer rate between the web site server and the spider engine to a desired level.
The present invention is directed to a system for retrieving web-site based information using a spider engine at a target bandwidth. A timing module is coupled to or otherwise associated with the spider engine. The timing module includes a data receiver, a bytes accumulator, a current time determiner, a wait time calculator, and a wait time transmitter. The data receiver receives a target bandwidth, B
T
, and at least one bytes count from the spider engine. The bytes accumulator accumulates the at least one bytes count received from the spider engine to create an aggregate bytes count, bytes
AGG
. The current time determiner determines a start time, T
START
, and current time, T
NOW
, for the at least one received bytes count. The wait time calculator calculates a wait time as a function of bytes
AGG
, B
T
, and an elapsed time (T
NOW
−T
START
). The wait time is the amount of time the spider engine should wait to initiate a next web-site data retrieval to reach the target bandwidth. A wait time transmitter transmits the wait time, T
WAIT
, calculated by the wait time calculator to the spider engine.
The present invention is further directed to a method of retrieving web site based information at a target bandwidth. A target bandwidth, B
T
, is received. The target bandwidth, B
T
, defines a desired information transfer rate with the web site. A wait time, T
WAIT
, is calculated. Data retrieval from the web site is delayed by the calculated wait time so that the data is retrieved at the desired target bandwidth, B
T
.
A start time, T
START
, is calculated. Retrieval of data is initiated from a remote web-site across a network. A number of bytes received is detected. An aggregate bytes count, bytes
AGG
, is incremented by the number of bytes received. A current time, T
NOW
, is calculated. The wait time, T
WAIT
, is calculated. T
WAIT 
may be calculated according to the equation:
T
WAIT
=(bytes
AGG
)/
B
T
−(
T
NOW
−T
START
)
REFERENCES:
patent: 5583995 (1996-12-01), Gardner et al.
patent: 6018518 (2000-01-01), Smallwood et al.
patent: 6222825 (2001-04-01), Mangin et al.
patent: 6263020 (2001-07-01), Gardos et al.
patent: 6321265 (2001-11-01), Najork et al.
patent: 6351755 (2002-02-01), Najork et al.
patent: 6466940 (2002-10-01), Mills
patent: 2003/0037158 (2003-02-01), Yano et al.
patent: WO 98/27696 (1998-06-01), None
IBM Technical Disclosure Bulletin, NN8905154, May 1989.*
Copy of International Search Report for Appl. No. PCT/US01/12648 issued Nov. 21, 2001, 6 Pages.
Cooper Jeremy S.
Foulger Michael G.
El-Hady Nabil
icPlanet Corporation
Sterne Kessler Goldstein & Fox P.L.L.C.
LandOfFree
Regulating rates of requests by a spider engine to web sites... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Regulating rates of requests by a spider engine to web sites..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Regulating rates of requests by a spider engine to web sites... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3192977