System and method for efficient filtering of data set...

Electrical computers and digital processing systems: multicomput – Computer network managing – Computer network access regulating

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S217000, C709S219000, C709S224000

Reexamination Certificate

active

06952730

ABSTRACT:
A web crawler stores fixed length representations of document addresses in a buffer and a disk file, and optionally in a cache. When the web crawler downloads a document from a host computer, it identifies URL's (document addresses) in the downloaded document. Each identified URL is converted into a fixed size numerical representation. The numerical representation may optionally be systematically compared to the contents of a cache containing web sites which are likely to be found during the web crawl, for example previously visited web sites. The numerical representation is then systematically compared to numerical representations in the buffer, which stores numerical representations of recently-identified URL's. If the representation is not found in the buffer, it is stored in the buffer. When the buffer is full, it is ordered and then merged with numerical representations stored, in order, in the disk file. In addition, the document corresponding to each representation not found in the disk file during the merge is scheduled for downloading. The disk file may be a sparse file, indexed to correspond to the numerical representations of the URL's, so that only a relatively small fraction of the disk file must be searched and re-written in order to merge each numerical representation in the buffer.

REFERENCES:
patent: 5564037 (1996-10-01), Lam
patent: 5893086 (1999-04-01), Schmuck et al.
patent: 5913208 (1999-06-01), Brown et al.
patent: 5953729 (1999-09-01), Cabrera et al.
patent: 5974455 (1999-10-01), Monier
patent: 6094649 (2000-07-01), Bowen et al.
patent: 6301614 (2001-10-01), Najork et al.
patent: 6321265 (2001-11-01), Najork et al.
patent: 6490658 (2002-12-01), Ahmed et al.
patent: 6547829 (2003-04-01), Meyerzon et al.
Brin and Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Database (Online), Available Web Site: http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm Last Update: Feb. 3, 2000.
Heydon and Najork, Mercator: A Scalable, Extensible Web Crawler, [No Info].

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for efficient filtering of data set... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for efficient filtering of data set..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for efficient filtering of data set... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3442943

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.