Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-08-03
2004-06-01
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06745194
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to indexing a database, and, more particularly, to a technique for deleting duplicate records referenced in an index of a database.
BACKGROUND OF THE INVENTION
It has been well known that computer systems can be used to index records of a database. In recent years, a unique distributed database has emerged in the form of the World-Wide-Web (Web). The database records of the Web are in the form of pages accessible via the Internet. Here, tens of millions of pages are accessible by anyone having a communications link to the Internet.
The pages are dispersed over millions of different computer systems all over the world. Users of the Internet constantly desire to locate specific pages containing information of interest. The pages can be expressed in any number of different character sets such as English, French, German, Spanish, Cyrillic, Kanakata, and Mandarin. In addition, the pages can include specialized components, such as embedded “forms,” executable programs, JAVA applets, and hypertext.
Moreover, the pages can be constructed using various formatting conventions, for example, ASCII text, Postscript files, html files, and Acrobat files. The pages can include links to multimedia information content other than text, such as audio, graphics, and moving pictures. As a complexity, the Web can be characterized as an unpredictable random update, insert, and delete database with a constantly changing morphology.
One characteristic of the World-Wide-Web makes it relatively easy to copy Web pages from one site to another. Web users frequently incorporate pages created by others into their own pages to streamline access. It is estimated that as much as 25% of the Web is composed of duplicate pages. If all the duplicate pages are fully indexed, the amount of storage required for the index would greatly increase. Therefore, there is a need for a technique which minimizes the likelihood that duplicate pages are indexed.
SUMMARY OF THE INVENTION
Briefly according to the present invention, a technique for deleting duplicate records referenced in an index of a database. In one embodiment, the technique may be realized by receiving a record; determining a fingerprint for the record; comparing the fingerprint of the record with fingerprints of previously indexed records; and, when the comparing act determines that the fingerprint of the current received record is the same as at least one of the fingerprints of any of the previously indexed records, identifying the current record as a record to be deleted.
The present invention will now be described in more detail with reference to exemplary embodiments thereof as shown in the appended drawings. While the present invention is described below with reference to preferred embodiments, it should be understood that the present invention is not limited thereto. Those of ordinary skill in the art having access to the teachings herein will recognize additional implementations, modifications, and embodiments, as well as other fields of use, which are within the scope of the present invention as disclosed and claimed herein, and with respect to which the present invention could be of significant utility.
The location entries of the first and second index entries are searched subject to one or more constraints which must be satisfied. The constraints are expressed in the general form as C(a)≦C(b)+K, where C(a) means a current location of the first index entry, C(b) means a current location of the second index entry, and K is a predetermined constant.
The constraints are satisfied by reading locations of the second index entry until the current location of the second index entry is at least equal to the current location of the first index entry plus the predetermined constant.
REFERENCES:
patent: 4719642 (1988-01-01), Lucas
patent: 4839853 (1989-06-01), Deerwester et al.
patent: 5235578 (1993-08-01), Baas et al.
patent: 5264848 (1993-11-01), McGuffin
patent: 5265065 (1993-11-01), Turtle
patent: 5270712 (1993-12-01), Iyer et al.
patent: 5274805 (1993-12-01), Ferguson et al.
patent: 5278980 (1994-01-01), Pedersen et al.
patent: 5280610 (1994-01-01), Travis, Jr. et al.
patent: 5321833 (1994-06-01), Chang et al.
patent: 5414838 (1995-05-01), Kolton et al.
patent: 5418951 (1995-05-01), Damashek
patent: 5440730 (1995-08-01), Elmasri et al.
patent: 5440744 (1995-08-01), Jacobson et al.
patent: 5450580 (1995-09-01), Takada
patent: 5467134 (1995-11-01), Laney et al.
patent: 5485611 (1996-01-01), Astle
patent: 5544352 (1996-08-01), Egger
patent: 5550965 (1996-08-01), Gabbe et al.
patent: 5581758 (1996-12-01), Burnett et al.
patent: 5594899 (1997-01-01), Knudsen et al.
patent: 5598557 (1997-01-01), Doner et al.
patent: 5619709 (1997-04-01), Caid et al.
patent: 5640553 (1997-06-01), Schultz
patent: 5640558 (1997-06-01), Li
patent: 5649186 (1997-07-01), Ferguson
patent: 5652880 (1997-07-01), Seagraves
patent: 5652882 (1997-07-01), Doktor
patent: 5664172 (1997-09-01), Antoshenkov
patent: 5668988 (1997-09-01), Chen et al.
patent: 5678041 (1997-10-01), Baker et al.
patent: 5685003 (1997-11-01), Peltonen et al.
patent: 5696962 (1997-12-01), Kupiec
patent: 5724571 (1998-03-01), Woods
patent: 5745890 (1998-04-01), Burrows
patent: 5745900 (1998-04-01), Burrows
patent: 5970497 (1999-10-01), Burrows
patent: 6105019 (2000-08-01), Burrows
patent: 6230158 (2001-05-01), Burrows
patent: 6317741 (2001-11-01), Burrows
Business Wire, Open Text's Web Search for OEM's; Offers Unique Intelligent Search Capabilities, p. 9181355.
Information Intelligence Inc., World Wide Web Search Engines: Alta Vista & Yahoo, DR LINK, Acession No. 3168688, May 1996.
Yuwono et al, Wise: A World Wide Web Resource Database System, IEEE Transactions on Knowledge and Data Engineering, vol. 8, No. 4, Aug. 1996, pp. 548-554.
Steinberg, Seek and Ye Shall Find (Maybe), WIRED May 1996, p. 108 et al.
Automated Patent System Manual APS-TR-03.07, Operators and Symbols, sundry pages, Dec. 31, 1991.
Alta Vista Company
Hunton & Williams LLP
Mizrahi Diane D.
Mofiz Apu
LandOfFree
Technique for deleting duplicate records referenced in an... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Technique for deleting duplicate records referenced in an..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Technique for deleting duplicate records referenced in an... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3301700