Item name normalization

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06556991

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to query processing, and more specifically, to an item name normalization approach for processing queries.
BACKGROUND OF THE INVENTION
Information is typically retrieved from an information system by submitting a search query to the information system, where the search query specifies a set of search criteria. The information system processes the search query against a set of searchable items and provides search results to a user. For example, in the context of online shopping over the Internet, a user may submit a word-based search query that specifies the type of item and the brand name of the item that the user wishes to purchase. As used herein, the term “item name” refers to information used to identify an item. Thus, “item name” may, for example, refer to the brand name of an item, the model name of the item, or a short description of the item, which may include the brand name of the item. For example, a user that is shopping for a winter-camping sleeping bag may submit a word-based search query that specifies, “Lands' End sub-zero sleeping bag”. Thus, “Lands' End sub-zero sleeping bag” is an item name that describes the type of item (i.e. “sleeping bag”), a subclass of that item (i.e. “sub-zero”), and the brand name of the item (i.e. “Lands' End) that the user is interested in purchasing.
As used herein, the term “search results” refers to data that indicates the item names that satisfy a search query. One problem with using word-based search queries to retrieve information is that word-based search queries sometimes do not accurately reflect the intent of the user, and thus the user is often dissatisfied with the search results. For example, assume that “Lands'End sub-zero sleeping bag ” is a valid item name. Further assume that a user who is interested in purchasing a sub-zero sleeping bag made by Lands'End may submit a search query that does not exactly match the item name “Lands'End sub-zero sleeping bag”. Instead the user submits a query such as “Landsend Company sub-zero sleeping bag”. The search results for such a query may be a null set because no item names match the search query “Landsend Company sub-zero sleeping bag”.
Another problem may be that the various sources from which item names are extracted may themselves provide inconsistent information on item names. Also, such sources may provide different information on prices and other product information associated with the item names. The following example illustrates the problem of inconsistent item names as well as the problem of different information associated with the item names in the context of online catalog shopping.
FIG. 1A
is a table
100
that shows brand names
101
,
103
,
105
,
107
and
109
. Brand names
101
,
103
,
105
,
107
and
109
are really variations of the brand name, “Lands'End”. Similarly,
FIG. 1B
is a table
110
that shows item names
112
,
114
,
116
,
118
and item name sources,
112
a
,
114
a
,
116
a
,
118
a
. Item names
112
,
114
,
116
and
118
are variations of the same item name. Variations of an item name will henceforth be referred to as “item name variants”. Assume that each item name variant in table
110
is extracted from a different shopping catalog. For example, item name variant
112
is extracted from item name source
112
a
, namely, “Catalog A”. Similarly, item name variants
114
,
116
,
118
are extracted from item name sources
114
a
,
116
a
,
118
a
respectively. Further assume that each item name source provides different information on the item name variants. For example, assume that item name source
112
a
indicates that item name variant
112
is priced at $10 and available in red, blue, green and yellow; item name source
114
a
indicates that item name variant
114
is priced at $11 and available in green and yellow only; item name source
116
a
indicates that item name variant
116
is priced at $9 and available in yellow only; and item name source
118
a
indicates that item name variant
118
is priced at $15 and available in 36 colors.
If, for example, a user submits a search query, “Landsend Company sweater for girls”, only item name variant
114
would satisfy the search query. Thus the user may believe that only green and yellow sweaters are available and that are priced at $11. The user may in fact be cost conscious and thus may prefer the $9 sweater described by item name source
116
a
. Alternatively, the user may be more concerned with having a range of colors from which to select and thus would probably prefer the information from item name source
118
a
, which indicates that the sweater is available in 36 colors.
Given the current demand for query processing in the context of online shopping and the limitations in the prior approaches, an approach for processing queries that does not suffer from limitations associated with conventional query processing approaches is highly desirable. In particular, an approach for processing queries that addresses the problem of multiple variants of an item name and the inconsistent information associated with an item name is highly desirable.
SUMMARY OF THE INVENTION
According to one aspect of the invention, a method is provided for normalizing item names. One or more clusters of item name variants are determined, wherein the item name variants are extracted from an initial set of documents and wherein each cluster of item name variants is a cluster of similar item name variants. A normalized item name that is logically associated with each cluster is determined. The item name variants in each cluster is mapped to create an initial set of mapping information. A dictionary is created using the mapping information.


REFERENCES:
patent: 5333317 (1994-07-01), Dann
patent: 5781772 (1998-07-01), Wilkinson et al.
patent: 5826263 (1998-10-01), Nakabayashi et al.
patent: 5960430 (1999-09-01), Haimowitz et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Item name normalization does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Item name normalization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Item name normalization will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3088870

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.