Method for indexing duplicate records of information of a...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06230158

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to indexing records of a database, and more particularly to indexing a database which has duplicate records.
BACKGROUND OF THE INVENTION
In the prior art, it has been well known that computer systems can be used to index records of a database. In recent years, a unique distributed database has emerged in the form of the World-Wide-Web (Web). The database records of the Web are in the form of pages accessible via the Internet. Here, tens of millions of pages are accessible by anyone having a communications link to the Internet.
The pages are dispersed over millions of different computer systems all over the world. Users of the Internet constantly desire to locate specific pages containing information of interest. The pages can be expressed in any number of different character sets such as English, French, German, Spanish, Cyrillic, Kanakata, and Mandarin. In addition, the pages can include specialized components, such as embedded “forms,” executable programs, JAVA applets, and hypertext.
Moreover, the pages can be constructed using various formatting conventions, for example, ASCII text, Postscript files, html files, and Acrobat files. The pages can include links to multimedia information content other than text, such as audio, graphics, and moving pictures. As a complexity, the Web can be characterized as an unpredictable random update, insert, and delete database with a constantly changing morphology.
One characteristic of the World-Wide-Web makes it relatively easy to copy Web pages from one site to another. Web users frequently incorporate pages created by others into their own pages to streamline access. It is estimated that as much as 25% of the Web is composed of duplicate pages. If all the duplicate pages are fully indexed, the amount of storage required for the index would greatly increase.
Therefore, it is desired to provide a technique which minimizers the likelihood that duplicate pages are indexed The technique should also allow for reindexing as duplicate pages are deleted.
SUMMARY OF THE INVENTION
The invention provides a computer implemented method for indexing duplicate information stored as records having different unique addresses in a database. The method generates a fingerprint for each record. The fingerprint is a singular value derived from all of the information of the record according to a predetermined combination of the information of the record.
The fingerprint is stored in the index as a unique fingerprint if the fingerprint is different than a previously stored fingerprint of the index. A reference to the unique address of the record is associated with the stored unique fingerprint.
If the fingerprint is identical to a previously stored unique fingerprint, then store a reference to the unique address of the record with the reference to the unique address of the previously stored unique fingerprint of the index.


REFERENCES:
patent: 5745900 (1998-04-01), Burrows
patent: 5960449 (1999-09-01), Nagaoka et al.
patent: 5970497 (1999-10-01), Burrows
patent: 5974238 (1999-10-01), Chase, Jr.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for indexing duplicate records of information of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for indexing duplicate records of information of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for indexing duplicate records of information of a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2462703

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.