Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-07-17
2004-03-02
Alam, Shahid (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06701333
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of categorization of items in general. More specifically, the present invention relates to the categorization of cases, such as documents, in a topic category within a hierarchical organization of cases.
2. Prior Art
With the increased amounts of data being generated, stored, and processed today, it is increasingly important for organizations to maintain their databases (e.g., collections of documents such as a customer support knowledge base) in an orderly manner. Many organizations rely on hierarchical schemes to organize their databases. A hierarchical organization of data utilizes successive levels of sub-categories which further narrow the scope of a category until a particular case (e.g., a document, file, program, etc.) is identified in the hierarchy. The advantage of such a system is that the hierarchy is easily navigated, even by users who are not expert with a particular database.
One problem with such a system is the reclassification of cases in a database hierarchy after changes to such a hierarchy. Organizations may decide that it is necessary to change their classification scheme to better suit their needs. For example, this can be the result of wanting to make the database easier to navigate, creating new categories, merging categories, splitting old categories, or moving cases between categories. The reclassification of cases afterwards can often require as much effort as the original classification process and may be complicated by the fact that a single case can belong in multiple categories.
The worst case scenario of how to cope with these hierarchy changes would be to classify items into the new hierarchy, without leveraging any information about the old classification of items. Another method is to manually reclassify only those cases that are affected by the changes to the hierarchy (e.g. moving batches of cases from one category to another one). If the changes are simple enough, such as renaming a category or creating a new category with no items in it, no reclassification is needed. However, most changes require much more effort and are difficult to implement. If this reclassification is performed manually, the possibility of mis-classification can be a problem. The database hierarchies can contain millions of cases and anyone reclassifying cases would require expert level knowledge of the entire new hierarchy to correctly perform their task. There is a need for a solution that facilitates the migration of cases when changes are made to a hierarchy.
These problems are magnified in organizations that maintain multiple hierarchical databases containing similar information. These organizations may maintain separate hierarchies for a variety of reasons. For example, an organization may find accessing particular data more efficient when a variety of hierarchical schemes are employed rather than just one. In one hierarchy, cases may be organized according to the operating system they pertain to. Another hierarchy may organize cases according to what application they reference. Although such hierarchies are separate, there may be relationships among them. The same solution that helps with changes to a hierarchy can also be applied to use classification in one hierarchy to facilitate classification in another.
FIGS. 1A and 1B
illustrate exemplary data hierarchies
100
and
101
used to organize data (e.g., business metrics, transformed data, and raw data) and information in an organization utilizing separate hierarchies. In
FIG. 1A
, a hierarchical database
100
has a root level directory
105
containing two sub categories: operating system 1 (
110
) and operating system 2 (
115
). Operating system 1 has further sub-categories of hardware
120
and software
125
.
In
FIG. 1B
, database
101
has a root level directory
150
containing two sub categories: hardware
160
and software
165
. Hardware
160
has been further sub categorized with a category for printers
170
. Software
165
has been further divided into a categories for applications
175
and operating systems
176
. Application
180
is a sub-category of applications
175
, while operating system 1 and operating system 2 (
190
and
195
respectively) are sub-categories of operating systems
176
.
Hierarchy
100
represents a hierarchical scheme currently used by an organization. Hierarchy
101
represents a new hierarchical scheme that the organization is moving to, or one of a number of hierarchies used simultaneously by an organization. The data in both hierarchies is organized utilizing successive levels of sub-categories which further narrow the scope of a category until a particular case is identified in the hierarchy. For example the user can navigate through hierarchical organization
101
by selecting an item from the top-level menu (e.g., either “hardware” or “software”). The user can then make further selections at each subsequent level of hierarchical organization
101
. After selecting “software,” a user can then select “applications” or “operating systems.” The user can move backwards or forwards (up or down) in hierarchical organization
101
; for example, from “operating systems,” the user can move back up to “software”, or to “operating system 1”, or “operating system 2.”
Accordingly, what is needed is a method of efficiently migrating data from one categorization hierarchy to another hierarchy. A further need exists for a method which meets the above need and allows categorization information to be shared among a plurality of related hierarchies such that the categorization of an item in one hierarchy is leveraged to facilitate the categorization of that item and others in another hierarchy.
SUMMARY OF THE INVENTION
The present invention facilitates efficient migration of data from one categorization hierarchy to another hierarchy. It can determine the best category in a new hierarchy for cases previously classified in an old hierarchy and can automatically derive a classifier for the new hierarchy to classify new items. The present invention can be used as a “virtual” classifier by combining classifiers for a plurality of related hierarchies. Classifications made in one categorization hierarchy (e.g., adding, deleting, or moving a document to a different category) are updated across the plurality of related hierarchies and can be used to help classify other documents in the related hierarchies as well.
Embodiments of the present invention are directed to a method of efficiently migrating data from one categorization hierarchy to a new hierarchy. Data, item, document, and/or case refer to any file, document, program, raw or processed data, or any information which may be contained in a data hierarchy. A mapping is created which describes where the cases in one hierarchy will be placed in a new hierarchy. The classifier of the first hierarchy is merged with this mapping to act as a classifier for the second hierarchy. Cases from the first hierarchy are classified in the new hierarchy using this merged mapping. In another embodiment, a training set of classified items is designated from a first hierarchy and mapped to a second hierarchy. Using machine learning, a classifier for the second hierarchy is created and used to classify subsequently migrated cases.
Migration of data using the present invention requires much less human effort, and is likely to be more accurate than manual reclassification. Induced classifiers via machine learning technology are directly dependent on how large a training set is available, and the present invention provides a way to transfer the old training set to the new hierarchy, reducing the cost and delay to obtain a new training set sufficiently large to accurately induce a classifier.
The present invention can act as a virtual classifier for multiple hierarchies in an organization, providing updated categorization information for multiple hierarchical databases. Cases classified in one hierarchy are used to help classify those cases in all of the o
Forman George Henry
Suermondt Henri Jacques
Alam Shahid
Ehichioya Fred
Hewlett--Packard Development Company, L.P.
LandOfFree
Method of efficient migration from one categorization... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of efficient migration from one categorization..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of efficient migration from one categorization... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3256771