Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-05-11
2003-03-25
Coby, Frantz (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06539373
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to data storage and retrieval techniques, and more particularly to a system and method for rapidly identifying the existence and location of an item in a file.
2. Description of Background Art
In many computer-related applications, it is useful to rapidly identify whether or not a particular item exists in a stored file, database, or table. For example, one such application involves an implementation of a content directory of World Wide Web sites, including listings of Uniform Resource Locators (URLs) identifying on-line documents. It may be useful for a user or automated software application to identify whether or not a particular URL is listed in a particular content directory. Mechanisms for searching multiple pieces of text-based information in a document space such as the World Wide Web often take one of two types. The first type of search mechanism involves providing a text string to a search engine, which then retrieves a descriptor or identifier for any document containing the specified text string. Various combinations of text-strings and Boolean operators may be provided to implement more complex searches. However, the literal nature of such text-based searches often results in retrieval of documents that are unrelated to the intended meaning or context of the search terms. For example, a search for information on lions using the word “lion” as a search term may result in retrieval of documents describing the motion picture “The Lion King”, community service clubs such as “Lion's Club”, and other documents unrelated to the intended object of the search.
The second type of search mechanism is a category search, which employs a category directory describing a hierarchy of information categories. The search is performed by traversing the hierarchy to successively narrower categories until the desired set of documents is reached. Therefore, a search for information on lions might begin with a broad category of “science”, then proceed down the hierarchy to “biology”, “zoology”, “mammals”, and so forth. This approach tends to lessen or eliminate the above-described problem endemic to literal text-based searches. However, if one desires to search for information on lions within the subcategory “science/biology/zoology/mammals”, and if no explicit “lions” subcategory exists, one must manually search through all document titles under that subcategory looking for documents related to lions.
What is needed is a mechanism for rapidly determining, for each result of a text-based search, whether the indicated result is listed in a particular category directory representing a desired subject area.
Alternatively, there may be other applications in which it is useful to rapidly determine whether or not an item exists in a stored file. In some situations, the existence of the item is known, but the location may be unknown. In other situations, it may be unknown whether or not the item exists.
Several search techniques exist in the prior art for determining whether a particular record is stored in a master file, and obtaining the address of location where the record is stored. For example, the master file may be traversed in its entirety, or it may be sorted, or a binary tree search may be performed. Such techniques are time-consuming, and may involve excessive overhead in maintaining the master file.
One known technique for reducing search time is hashing, as described in D. Knuth,
The Art of Computer Programming
vol. 3, Addison-Wesley: 1973. Referring now to
FIG. 2
, there is shown a block diagram of a hash table architecture according to the prior art. Master file
205
, which is typically stored in a data storage device such as a hard drive or other long-term storage, contains a number of data records
223
,
224
,
225
,
226
,
227
,
228
. Records
223
,
224
,
225
,
226
,
227
,
228
contain any type of information that may be retrieved for use by a user or by a computer system. Each record
223
,
224
,
225
,
226
,
227
,
228
is stored at a particular location having a specific address, so that a record may be retrieved from master file
205
in a conventional manner by reference to the address of the record. Any number of records
223
,
224
,
225
,
226
,
227
,
228
may be included in master file
205
.
Hash table
204
is constructed and stored, for example in data storage such as a hard drive or other storage device. Hash table
204
can be of arbitrary size, and contains some number of hash buckets
211
,
212
,
213
, each bucket containing some number of entries. Each entry contains a fixed-length, for example 32-bit, pointer
217
,
218
,
219
,
220
,
221
,
222
to an address indicating a particular location in master file
205
. In the example of
FIG. 2
, pointer
219
points to the address of a location in master file
205
containing record
225
, while pointer
220
points to the address of a location in master file
205
containing record
227
.
Any number of hash buckets
211
,
212
,
213
may be provided in hash table
204
, and any number of entries, or pointers
217
,
218
,
219
,
220
,
221
,
222
can be provided in each hash bucket
211
,
212
,
213
. For example, 65,536 buckets
211
,
212
,
213
may be included, each bucket containing up to
32
entries.
Each hash bucket
211
,
212
,
213
is associated with a hash key
214
,
215
,
216
that can be obtained by applying hash function
202
to an item to be stored or retrieved. Hash function
202
may be any operation that can be performed on the item, and preferably is an operation that results in a relatively even distribution of items among buckets
211
,
212
,
213
in hash table
204
. For example, one such hash function
202
involves performing successive exclusive-OR operations on the characters forming the character string of the item. This results in a 16-bit hash key that is capable of uniquely identifying 2
16
, or 65,536 different hash buckets
211
,
212
,
213
.
When a new record containing an item is added to master file
205
, a pointer to the record is added to hash table
204
. The pointer is added to the appropriate hash bucket, determined by applying hash function
202
to the value of the new item. The new pointer in the hash bucket contains an address indicating the location in master file
205
of the new item.
In order to determine whether a particular item exists in master file
205
, a search term
201
is supplied containing a text string or other identifier for the desired record. In the example of
FIG. 2
, search term
201
indicates the data represented by record
227
. Hash function
202
is applied to search term
201
in order to obtain hash key
203
. Hash bucket
212
containing the identical key
215
to the obtained hash key
203
is identified.
Bucket
212
is then traversed. For each item in bucket
212
, the referenced location in master file
205
is consulted and the stored item is compared to search term
201
. If a match is found, the traversal ends and a positive result is returned. If the location of the item is desired, it may also be returned. If all items in bucket
212
are checked without finding a match, a negative result is returned.
Thus, in the example of
FIG. 2
, pointer
219
is dereferenced and the corresponding record
225
in master file
205
is consulted. Record
225
is compared with search term
201
, and no match is found. Pointer
220
is then dereferenced and the corresponding record
227
in master file
205
is consulted. Record
227
is compared with search term
201
, and a match is found. A positive result is returned, along with the address of record
227
or the data contained therein, as desired.
The prior art technique of
FIG. 2
for identifying the existence and location of an item in a file is relatively slow because it requires a relatively large number of reads from hash table
204
and from master file
205
. For a worst-case positive result, all pointers in the identified bucket must be dereferenced
Apple Computer Inc.
Coby Frantz
Fenwick & West LLP
LandOfFree
Contextual searching by determining intersections of search... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Contextual searching by determining intersections of search..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Contextual searching by determining intersections of search... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3067993