System and method for flexible indexing of document content

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C715S252000, C715S252000

Reexamination Certificate

active

06741979

ABSTRACT:

BACKGROUND OF THE INVENTION
Incorporation by Reference
This patent application discloses an invention which may optionally form a portion of a larger system. Other portions of the larger system are disclosed and described in the following co-pending patent applications, all of which are subject to an obligation of assignment to the same person. The disclosures of these applications are herein incorporated by reference in their entireties.
METHOD AND SYSTEM FOR AUTOMATIC HARVESTING AND QUALIFICATION OF DYNAMIC DATABASE CONTENT, William J. Bushee, Thomas W. Tiahrt, and Michael K. Bergman, and Filed Jul. 24, 2001, application Ser. No. 09/911,522 now pending.
SYSTEM AND METHOD FOR EFFICIENT CONTROL AND CAPTURE OF DYNAMIC DATABASE CONTENT, William J. Bushee and Thomas W. Tiahrt, Filed Jul. 24, 2001, application Ser. No. 09/911,434 now pending.
1. Field of the Invention
The present invention relates to radix search tries and more particularly pertains to a new system and method for flexible indexing of document content for facilitating the rapid search and retrieval of large collections of documents.
2. Description of the Prior Art
The use of lexicographic (digital) search trees is known in the prior art. A radix search trie is a digital search tree with a fixed alphabet size. Each edge in the trie represents a character in the alphabet. Each internal node represents a string prefix. Each external node represents a string. The tree records the minimal prefix set of characters required to differentiate all strings in the string set. Strings are found by following an access path defined by the string's characters.
Trie variations have developed into three broad categories: array based tries, where arrays of pointers are used to access subtrees; binary search tree based tries, where a binary tree is used to traverse the trie; and list based tries, where linked lists provide access linkage.
Array lookup can be relatively fast, but is typically limited to small alphabet sizes, since large-sized alphabets have too many null pointers. Binary search trees are relatively compact, but each bit must be examined, so binary search trees are relatively slower than arrays. Linked lists are relatively more storage efficient than arrays, but have relatively slower access times than arrays.
When extremely large numbers of strings are to be indexed, storage efficiency relatively greater than an array trie, and relatively faster access than a linked list trie or binary search trie is desirable.
SUMMARY OF THE INVENTION
In view of the foregoing disadvantages inherent in the known types of radix search tries now present in the prior art, the present invention provides a new system for flexible indexing of document content wherein the same can be utilized for facilitating the rapid search and retrieval of large collections of documents.
The invention contemplates a method for flexible indexing of document content, and includes obtaining a collection of documents to be indexed, storing said collection of documents in a single document information stream, parsing each one of said documents into constituent words to facilitate indexing, creating a plurality of stem words to be indexed by stemming each word into a standard prefix, and indexing each stem word.
There has thus been outlined, rather broadly, the more important features of the invention in order that the detailed description thereof that follows may be better understood, and in order that the present contribution to the art may be better appreciated. There are additional features of the invention that will be described hereinafter and which will form the subject matter of the claims appended hereto.
In this respect, before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting.
As such, those skilled in the art will appreciate that the conception, upon which this disclosure is based, may readily be utilized as a basis for the designing of other structures, methods and systems for carrying out the several purposes of the present invention. It is important, therefore, that the claims be regarded as including such equivalent constructions insofar as they do not depart from the spirit and scope of the present invention.


REFERENCES:
patent: 5778378 (1998-07-01), Rubin
patent: 5949410 (1999-09-01), Fung
patent: 5974413 (1999-10-01), Beauregard et al.
patent: 6266682 (2001-07-01), LaMarca et al.
patent: 6269380 (2001-07-01), Terry et al.
patent: 6324551 (2001-11-01), Lamping et al.
patent: 6330573 (2001-12-01), Salisbury et al.
patent: 6397231 (2002-05-01), Salisbury et al.
patent: 6446081 (2002-09-01), Preston
patent: 6539374 (2003-03-01), Jung
patent: 6562076 (2003-05-01), Edwards et al.
Vasconcelos, Nuno et al., “A Bayesian Framework for Content-based Indexing and Retrieval” Proceedings Data Compression Conference, Mar. 30-Apr., 1998, abstract only, p. 580.*
Bordogna, Gloria et al., “A user-adaptive indexing model of astructured documents”, The 10th IEEE International Conference on Fuzzy Systems, Dec. 2-5, 2001, pp. 984-989, vol. 3.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for flexible indexing of document content does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for flexible indexing of document content, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for flexible indexing of document content will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3263968

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.