Method and apparatus for multi-class, multi-label...

Data processing: artificial intelligence – Machine learning

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S020000, C706S021000

Reexamination Certificate

active

06453307

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to information categorization. More particularly, the present invention relates to multi-class, multi-label information categorization.
BACKGROUND OF THE INVENTION
Information categorization is the process of classifying information samples into categories or classes. By way of example, text categorization is the process of classifying a text document, such as into a “politics,” a “business” or a “sports” category, based on the document's content. When used in connection with a speech recognition device, information categorization can be used, for example, by a telephone network provider to automatically determine the purpose of a telephone call received from a customer. If the customer says, “I would like to charge this call to my credit card,” the system could automatically recognize that this is a calling-card request and process the call accordingly. Note that the information is categorized “automatically” in that human input is not required to make the decision. Although this example involves a speech-categorization problem, a text-based system can be used if the customer's spoken message is passed through a speech recognizer.
It is known that an information categorization algorithm can “learn,” using information samples, to perform text-categorization tasks, such as the ones described above. For example, a document might be classified as either “relevant” or “not relevant” with respect to a pre-determined topic. Many sources of textual data, such as Internet news feed, electronic mail and digital libraries, include different topics, or classes, and therefore pose a “multi-class” categorization problem.
Moreover, in multi-class problems, a document may be relevant to several different classes. For example, a news article may be relevant to “politics” and “business.” Telephone call-types are also not mutually exclusive (i.e., a call can be both “collect” and “person-to-person”).
One approach to multi-class, multi-label information categorization is to break the task into disjoint binary categorization problems, one for each class. To classify a new information sample, such as a document, all the binary classifiers are applied and the predications are combined into a single decision. The end result can be, for example, a list of which classes the document probably belongs to, or a ranking of possible classes. Such an approach, however, can ignore any correlation that might exist between different classes. As a result, the information categorization is less effective and/or efficient than may be desired.
In view of the foregoing, it can be appreciated that a substantial need exists for an information categorization method and apparatus that is directed to the multi-class, multi-label problem and addresses the problems discussed above.
SUMMARY OF THE INVENTION
The disadvantages of the art are alleviated to a great extent by a method and apparatus for multi-class, multi-label information categorization. A weight is assigned to each information sample in a training set, the training set containing a plurality of information samples, such as text documents, and associated labels. A base hypothesis is determined to predict which labels are associated with a given information sample. The base hypothesis may predict whether or not each label is associated with the information sample, or may predict the likelihood that each label is associated with the information sample. In the case of a document, the base hypothesis may evaluate words in each document to determine one or more words that predict the associated labels.
When a base hypothesis is determined, the weight assigned to each information sample in the training set is modified based on the base hypothesis predictions. For example, the relative weight assigned to an information sample may be decreased if the labels associated with that information sample are correctly predicted by the base hypothesis. These actions are repeated to generate a number of base hypotheses which are combined to create a combined hypothesis. An un-categorized information sample can then be categorized with one or more labels in accordance with the combined hypothesis. Such categorization may include predicting which labels are associated with each information sample or ranking possible labels associated with each information sample.
With these and other advantages and features of the invention that will become hereinafter apparent, the nature of the invention may be more clearly understood by reference to the following detailed description of the invention, the appended claims and to the several drawings attached herein.


REFERENCES:
patent: 5613037 (1997-03-01), Sukkar
patent: 5710864 (1998-01-01), Juang et al.
patent: 5819247 (1998-10-01), Freund et al.
patent: 5912986 (1999-06-01), Shustorovich
Neti et al, “Word-Based Confidence Measures as a Guide for Stack Search in Speech Recognition”, IEEE International Conference on Acoustics, Speech, and signal processing, Apr. 1997.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for multi-class, multi-label... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for multi-class, multi-label..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for multi-class, multi-label... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2836466

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.