Method for automatically finding frequently asked questions...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06804670

ABSTRACT:

FIELD OF THE INVENTION
The present invention generally relates to a system and method for classifying and analyzing data, and is particularly applicable to a method for automatically generating a list of “Frequently Asked Questions” or FAQs, by analyzing data sets describing calls and responses received at a help desk.
BACKGROUND OF THE INVENTION
As technology becomes ever more pervasive it has become increasingly common for organizations to provide a helpdesk service to their customers. Typically, a customer will call the helpdesk to ask for information and to seek solutions to problems relating to the operation of products, the performance of services, necessary procedures and forms, etc.
Typically, helpdesks are staffed by knowledgeable human operators, who often spend considerable time with each caller in order to answer the caller's questions. As a result, helpdesk operation could be quite expensive to maintain.
Much of the helpdesk operator's time is spent solving identical or nearly identical problems over and over again. A need arises for a technique by which the solutions to frequently recurring problems may be automated in order to improve the efficiency of helpdesk operation. In particular, what is needed is a technique that can aid in identification of helpdesk inquiry and problem categories that are most amenable to automated fulfillment or solution.
SUMMARY OF THE INVENTION
The present invention is useful in identifying candidate helpdesk problem categories that are most amenable to automated solutions. In a preferred embodiment, the present invention uses clustering techniques to identify collections of problems from free form text descriptions. It then facilitates a human user's modifications to collections as appropriate to improve the coherence and usefulness of the classification. Measures such as the level of detail, the depth of search, the confidence level, and overlap levels, are used to help the user determine which set of examples are the best candidates to become a FAQ.
The present invention describes a method, system, and a computer program product for interactive classification and analysis. In order to carry out the method, a dictionary is generated whereby each word in the text data set is identified, and the number of documents containing these words is counted. The most frequently occurring words in the corpus compose a dictionary. A count of occurrences of each word in the dictionary within each document in the document set is generated. The count may be generated by generating a matrix having rows and columns, each column corresponding to a word in the dictionary, each row corresponding to an example in the text corpus, and each entry representing a number of occurrences of the corresponding word in each example.
The set of documents may be partitioned by partitioning the set of examples into a plurality of clusters using a k-means partitioning procedure. The k-means partitioning procedure may include determining a distance between a centroid and an example vector using a distance function of:
d
(
X,Y
)=−
X.Y/∥X∥.∥Y∥
wherein X is the centroid, Y is the example vector, and d(X,Y) is the distance between the centroid and the example vector.
For each of the generated clusters, the present method sorts the dictionary terms in order of decreasing occurrence frequency within the cluster. It then determines a search space by selecting the top (or frequent) S dictionary terms, where S is a user specified value specifying the depth of search. Next, it chooses a set of L terms from the search space, where L is a user-specified value indicating the desired level of detail.
For each possible combination of L terms in the search space, the present method finds the number of examples containing all L terms. If this number is not null, and if the overlap between this set and all the other sets is less than an overlap value specified by user input, then this set of examples becomes a FAQ.
For each generated FAQ, the present method chooses a name based on the relevant terms in the order in which they occur most often in the text.


REFERENCES:
patent: 5141439 (1992-08-01), Cousins
patent: 5423038 (1995-06-01), Davis
patent: 5485601 (1996-01-01), Ching
patent: 5842221 (1998-11-01), Schmonsees
patent: 5974412 (1999-10-01), Hazlehurst et al.
patent: 6018736 (2000-01-01), Gilai et al.
patent: 6024571 (2000-02-01), Renegar
patent: 6028601 (2000-02-01), Machiraju et al.
patent: 6137911 (2000-10-01), Zhilyaev
patent: 6253169 (2001-06-01), Apte et al.
patent: 6584464 (2003-06-01), Warthen
patent: 6618725 (2003-09-01), Fukuda et al.
patent: 6665640 (2003-12-01), Bennett et al.
patent: 2002/0023144 (2002-02-01), Linyard et al.
patent: 2003/0217052 (2003-11-01), Rubenczyk et al.
K. Hammond, R. Burke, C. Martin, and S. Lytinen (1995), FAQ Finder: A Case-Based Approach to Knowledge Navigation, pp. 80-86.*
Kevin Crowston and Marie Williams (1999), The Effects of Linking on Genres of Web Documents.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for automatically finding frequently asked questions... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for automatically finding frequently asked questions..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for automatically finding frequently asked questions... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3279162

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.