Data processing: measuring – calibrating – or testing – Measurement system – Statistical measurement
Reexamination Certificate
1998-11-06
2002-05-28
Assouad, Patrick (Department: 2857)
Data processing: measuring, calibrating, or testing
Measurement system
Statistical measurement
C705S014270
Reexamination Certificate
active
06397166
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to a computer-implemented method for clustering retail sales data, and more particularly to a method which assumes a model of retail demand as a function, for example, of the price, base sales rate, and seasonal factors, and clusters together items that have, for example, the same seasonal and price effect factors based on the model fit.
2. Description of the Related Art
Conventional systems utilize clustering for the construction of a classification scheme over a set of objects such that objects within classes are similar in some respects but are different from those in other classes.
The basic data for cluster analysis is a set of N entities for each of which p attribute values have been observed (e.g., N retail items for each of which the last 52 weeks of sales has been observed). The major features of cluster analysis include:
Choice of variables—This feature deals with determining which attributes of the elements to be clustered will be considered.
Measurement of similarity or distance—Most clustering techniques begin with a calculation of a matrix of similarities or distances between the entities to determine their “closeness” for clustering. Additionally, a measure of similarity should be definable between groups. Some typical choices are Euclidean distance, city block distance, Minkowski distance, and similarity coefficients based on the Pearson or Spearman correlation coefficients, as discussed for example in Kaufman et al., “Finding Groups in Data-An Introduction to Cluster Analysis”, John Wiley & Sons, 1990.
Generation of clusters—All clustering techniques attempt to partition the data set into a set of clusters such that individuals in a cluster have high similarity to one another and differ from those in other clusters. Similarity is defined quantitatively as discussed above. A number of techniques exist for clustering and differ in the approaches used for initiating clusters, searching through the solution space for target clusters, and the termination criterion. Some known clustering techniques relevant to the present invention include:
Hierarchical clustering: Given n objects, hierarchical clustering consists of a series of clustering from the initial situation when each object may be considered a singleton cluster to the other extreme where all objects belong to one cluster. Hierarchical techniques may be subdivided into agglomerative methods which proceed by a series of successive fusions of the n objects into groups, and divisive methods which partition the set of n entities successively into finer partitions.
Optimization techniques: Optimization techniques attempt to form an optimal k-partition over the given set of objects (i.e., divide the set of entities into k mutually exclusive clusters) to optimize a pre-defined objective function, where k is usually input by the user. The pre-defined objective function is usually a measure for maximizing similarity within the cluster and the distance between clusters. The techniques employed differ in the methods by which an initial partition of the data is obtained, and the method for iteratively searching for the optimal partition.
Other techniques include density search, fuzzy clustering, neural networks, and conceptual clustering, as described, for example, in B. Everitt, “Cluster Analysis”, Third Edition, Edward Arnold, 1993.
The term model-based clustering has also been used in another context, as described in Banfield et al., “Model-Based Gaussian and Non-Gaussian Clustering”,
Biometrics,
49, 803-822, 1993. This approach assumes a probability model for the population of interest and a density function for the observations.
In practical applications, among the popular methods for clustering are hierarchical- and optimization-based techniques, as mentioned above, which can be used to cluster retail sales data based on differences in the time series. Other applications of clustering can be found in a range of areas from finance (e.g., clustering stock price movement data) to the social sciences (e.g., clustering data on people's responses and preferences).
However, currently available methods for clustering do not assume a model relating the independent and dependent variables, and are hence, for example, in a retail environment, restricted to grouping only on the basis of observed sales data. Therefore, separating items on the basis of price effects etc. on demand is impossible.
For example, consider the sales of two items (e.g., sales
1
and sales
2
) shown in FIG.
5
A. Looking only at the sales data, they appear similar in sales pattern over time (e.g., weeks, months, etc.), and as such, the items would be assumed to exhibit similar seasonal behavior. However, when other factors are also considered (e.g., such as price in FIG.
5
B), and a model relating the sales to the price is assumed, then differing seasonal patterns and differing price sensitivities may be shown (e.g., see FIG.
5
C). The conventional techniques do not provide for such consideration of other such variables. Instead, the conventional techniques factor only one variable.
Thus, conventional clustering techniques use only one stream of data (e.g., such as the sales data over time) and have no capability for factoring other data streams/variables, and thus may erroneously classify (e.g., cluster) items
1
and
2
as similar, when in fact the items are not similar.
SUMMARY OF THE INVENTION.
In view of the foregoing and other problems of the conventional methods and techniques, an object of the present invention is to provide a method for grouping of data sets (e.g., not restricted to retail sales data, but described below simply as an example) in cases where the data set includes an observed or dependent value, and one or more controllable or independent values, based on a model relating the independent and dependent variables.
In a first aspect, a method of grouping multiple data points, each data point being a set (e.g., a vector, a “tuple”, etc.) comprising a measured dependent value and at least one related independent variable value, includes fitting the data into a model relating the independent and dependent variables of the data, and calculating a similarity and a distance between the data points and groups of the data points, thereby to group the multiple data points.
In a second aspect, a system for grouping multiple data points, each data point being a set (e.g., a vector, a “tuple”, etc.) comprising a measured dependent value and at least one related independent variable value, includes means for fitting the data into a model relating the independent and dependent variables of the data, and means for calculating similarity and distance between the data points and groups of the data points, thereby to group the multiple data points.
In a third aspect, a signal-bearing medium is provided tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for computer-implemented model-based clustering for grouping multiple data points, each data point being a set (e.g., a vector, a “tuple”, etc.) comprising a measured dependent value and at least one related independent variable value, the program including fitting the data into a model relating the independent and dependent variables of the data, and calculating similarity and distance between the data points and groups of the data points, thereby to group the multiple data points.
In a fourth aspect, a method of model-based clustering, includes initializing clustering parameters for a plurality of items; reading-in an actual data set used for clustering, and reading cluster center seeds, and calculating an target number of clusters; incrementing an iteration counter; scoring each item in the data set against all the available cluster centers using a similarity measure process, wherein if a similarity measure value of the item being examined is greater than a minimum first parameter, no further search is performed for the
Leung Ying Tat
Levanoni Menachem
Ramaswamy Sanjay E.
Assouad Patrick
Kaufman, Esq. Stephen C.
McGinn & Gibb PLLC
LandOfFree
Method and system for model-based clustering and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for model-based clustering and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for model-based clustering and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2911740