Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-03-22
2002-05-21
Coby, Frantz (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06393427
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to organizing documents retrieved from the world wide web (WWW) or intranets. In particular, the present invention relates to organizing such documents under a classification scheme for efficient access.
2. Discussion of the Related Art
Two approaches to document organization are clustering (i.e., non-supervised learning) and classification (i.e., supervised learning). The major difference between clustering and classification is that clustering does not rely on a training set but classification does.
In one clustering technique, documents are dynamically clustered based on similarity. However, such an approach suffers from several shortcomings. First, the classification accuracy depends heavily on the number of documents in the database. Second, choosing good labels for categories generated based on clustering is difficult since the labels selected may not be meaningful to the users. To choose good labels for generated categories, many techniques based on word frequency analysis have been proposed. In general, however, these techniques have not been found effective. Consequently, for navigation purpose, clustering techniques are inferior to manual classification and labeling.
Classification is a method for both organizing documents in a document database and facilitating navigation of such a document database. Existing classifiers, such as Library of Congress Classification (LCC), can be used to organize local collections of documents. However, LCC's classification and category labels are usually too fine (e.g. six to seven levels) for organizing relatively smaller local collections of documents.
For client side document categorization, such as organizing bookmarks and electronic mail (“emails”) for individual users, the clustering approach is mainly chosen because a large document set is not available at the client side to train the classifier. On the other hand, at the server side, since abundant training data are available, the classification approach is often chosen.
Using the clustering approach to organize client documents (e.g., bookmarks and emails) suffers from many shortcomings resulting from the small document set at the client side. A small document set can generate clusters of no statistical significance and thus, when a small number of documents is added, which is proportionally large to the document set, the clusters can be easily changed.
SUMMARY OF THE INVENTION
The present invention provides a method for providing, on the client side, a navigation tree using an external classifier. The method comprises a maintenance method including a method for merging a parent internal node and leaf nodes, and a method for splitting an internal node in a parent internal node. In one embodiment, each leaf node represents a document in the navigation tree and each internal node is associated with a label representing a category of classification of the child internal nodes and leaf nodes associated with the parent internal node.
According to one aspect of the invention, a document insertion method is provided which inserts a document into the navigation tree according to a classification obtained from an external classifier using keywords in the document. The method also provides a document deletion method for deleting a document from the navigation tree. The method for splitting an internal node of a parent internal node is invoked by the document insertion method when a predetermined criterion is met. Similarly, the method for merging a parent internal node is invoked by the document deletion method when another predetermined criterion is met.
In one embodiment, the document insertion method and the document deletion method each include a step tending to maintain a preferred breadth of an internal node of the navigation tree to a predetermined value &agr;, being a desired number of child internal nodes and leaf nodes of a parent internal node.
The method of splitting an internal node of a parent internal node assigns leaf nodes to a new internal node, such that the total number of internal nodes and leaf nodes of the parent node is kept at a minimum after splitting. The predetermined criterion is met in the method for splitting a parent internal node when the total number of leaf nodes and internal nodes associated with the parent internal node is greater the sum of a predetermined value &agr; and a predetermined value &dgr;
split
. The method selects a minimum number of internal nodes for splitting, subject to a constraint (“constrained minimum”). In one embodiment, the constraint minimum applies when multiple internal nodes can be selected for splitting to result in the same net change in the total number of leaf nodes and internal nodes.
The method of merging a parent internal node assigns leaf nodes of an internal node to the parent internal node, such that the total number of internal nodes and leaf nodes of the parent node after merging is minimized. The predetermined criterion is met in the method of merging a parent internal node when the total number of leaf nodes and internal nodes associated with the parent internal node is less than the difference of a predetermined value &agr; and a predetermined value &dgr;
merge
. The method of merging a parent internal node selects a constrained maximum number of leaf nodes for merging. In one embodiment, the constraint minimum applies when multiple internal nodes can be selected for splitting to result in the same net change in the total number of leaf nodes and internal nodes.
In one embodiment, the predetermined value &agr;, the predetermined value &dgr;
split
and the predetermined value &dgr;
merge
are each user independently selectable.
In one embodiment, the method of the present invention assigns a document (i.e., a leaf node) to internal nodes according to an access frequency of the document. The internal node selected for each document is intended to minimize the number of steps necessary to reach a frequently accessed document.
The present invention is better understood upon consideration of the detailed description below and the accompanying drawings.
REFERENCES:
patent: 5430869 (1995-07-01), Ishak et al.
patent: 5870735 (1999-02-01), Agrawal et al.
patent: 5899992 (1999-05-01), Iyer et al.
patent: 5978795 (1999-11-01), Poutanen et al.
patent: 6061675 (2000-05-01), Wical
patent: 6144991 (2000-11-01), England
pharos.alexandria.ucb.edu/pharos-bin/LCC?LCC ID=(Root) 1997.
lcweb.loc.gov/catdir/cpso/lcco/lcco.html Aug. 31, 2000.
Foltz, Peter W., “Using Latent Semantic Indexing for Information Filtering,” psych.nmsu.edu/~pfoltz/cois/filtering-cois.html 1990.
Li, Wen-Syan et al., “WebDB Hypermedia Database System,”IEICE Transactions on Information Systems, 82(1), Jan. 1999.
Li, Wen-Syan et al., “PowerBookmarks: An Advanced Web Bookmark Database System and its Information Sharing and Management,” InProceedings of 5thInternational Conference on Foundations of Data Organization(FODO '98), Kobe, Japan, Nov. 1998.
Deerwester, Scott et al, “Indexing by Latent Semantic Analysis,”Journal of the American Society for Information Science, 41(6):391-407, Sep. 1990.
Honkela, Timo, “Self-Organizing Maps in Natural Language Processing,” Thesis for the degree of Doctor of Philosophy, Helsinki University of Technology, Department of Computer Science and Engineering, Public defense at 12thof Dec., 1997.
Yahoo Communications Corporation. Information available at http: 2001.
Dolin, R., et al., “Using Automated Classification for Summarizing and Selecting Heterogeneous Information Sources,”D-Lib Magazine, Jan. 1998.
Abrams, David, et al., “How People Use WWW Bookmarks,”ACM CHI 1997 Electronic Publications(Late-Breaking/Short Talks), 1997. Information available at http://.
Georgia Tech Research Corporation. GVU's 7thWWW User Survey, 1997. Information available at-1997-04/.
Miller, George A., et al., “Introduction to WordNet: An On-line Lexical Database,” Aug. 1993, 9 pp.
Takano, Hajime, et al., “Dynamic Bookmarks for the WWW—Ma
Chang Edward
Li Wen-Syan
Vu Quoc
Coby Frantz
NEC USA Inc.
Sughrue & Mion, PLLC
LandOfFree
Personalized navigation trees does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Personalized navigation trees, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Personalized navigation trees will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2831161