Method and apparatus for parallel profile matching in a...

Data processing: database and file management or data structures – Database design – Data structure types

Utility Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C705S001100, C705S400000, C709S201000

Utility Patent

active

06169989

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the efficient management of a large collection of user profiles and matching the profiles against a large volume of data. More particularly, the invention concerns a parallel profile matching method that efficiently manages the profiles, matches them to desired data using existing profile matching methods, and efficiently pushes the data to a user.
2. Description of the Related Art
Within the last twelve months, a new breed of Internet information system called webcasting or “push” systems has attracted enormous attention in the Internet industry. Instead of using a browser to pull information from the Internet, a push system automatically delivers information contents such as news and weather forecasts to a user based on a user's interest profile. The user's profile is a collection of predicates that identify the type, quantity or quality of information that should be pushed to the user. Because of its tremendous potential for completely changing the shape of the Internet, the battle for leadership has increasingly intensified.
Webcasting technology not only thrives on the Internet, but it can also be effectively applied in corporate intranets to vastly improve business operations and productivity. For example, a sales representative can be constantly alerted with new product information and price updates to better serve his or her customers, or on-line classified advertisements on apartments for rent can be automatically pushed to corporate employees who are looking for a place to live. Another application of webcasting is that newly issued U.S. patents within designated technology fields can be pushed to the users who have interests in them.
Currently, information delivered by a webcasting system is typically organized into a hierarchy of content channels. Because of the sheer volume of information in cyberspace, the key to allowing the webcasting technology to realize its full potential is giving users the ability to personalize their information selection and filter out “noise.” For example, a user subscribing to the IBM-Almaden-Seminar channel may use a filter to select only those seminars related to database technology. Non-related seminars or “noise” are thereby filtered out. This kind of personalized information content selection is recorded in user profiles and matched against all the information delivered by webcasting. The common practice in Internet information systems is to use traditional database indexing schemes on information contents, where user profiles are applied to an information index one at a time.
A main drawback in webcasting today is the lack of personalization. For example, users cannot customize their profile to only receive information about their favorite team when subscribing to a sports channel. Information covering all teams is usually pushed to the user because it is not possible to personalize the information.
At the heart of content personalization is the profile matching problem. Each user has a “personal” profile that specifies his or her interests and is usually represented as a Boolean expression over a basic predicate. A profile matching method will match a document with all profiles in a profile-database and return only those profiles whose Boolean expression is satisfied by the content of the document. This is a computationally expensive problem because of the large subscription volume and the diversity of information content. Regardless, webcasting systems must quickly perform profile matching “on-line” because they must deliver information in a timely fashion. Further, they have to maintain a dynamically changing profile database.
Cyberspace continues to expand as more and more people gain access to the Internet, resulting in an exponential increase in user profiles. What is needed is an invention that can perform large scale and high performance profile matching with a parallel profile matching method that can be applied to any Boolean based query language. Preferably, the method would utilize existing sequential profile matching methods currently used in webcasting systems.
DEFINITIONS AND NOTATIONS
Throughout the following discussion, the following definitions and notations apply:
A={A
1
, . . . , A
n
} is a set of Boolean profile expressions,
BP={p
1
, . . . , p
m
} is a set of basic predicates,
W={w
1
, . . . , w
m
} is a set of weights, that are associated with each basic predicate, and
G
1
, . . . , G
k
is a set of directed acyclic graphs (DAGs).
Each profile expression A
i
contains a set of basic predicates, which are connected by the Boolean operators AND, OR and NOT (see FIG.
6
). A profile expression A
i
can be represented by a DAG where the basic predicates are mapped to leaf nodes and the Boolean operators are mapped to inner nodes of the DAG. Using this representation for a set of profile expressions A yields a set of DAGs. All DAGs' root nodes are connected by a single root node to form a single DAG.
SUMMARY OF THE INVENTION
Broadly, the present invention solves a new technical challenge posed by the rapid growth of webcasting systems: how to efficiently match a large collection of user profiles against a large volume of data using parallel profile matching.
The invention may be implemented on a webcasting system that maintains a large collection of user profiles. The user profiles express each users interest and are based on any Boolean query language using basic predicates. These predicates are used to assert certain properties to information items. Basic predicates can be connected together with Boolean operators AND, OR and NOT to form more complex queries. As an example, the profile “Channel(Almaden-Calendar) AND Keyword(database)” expresses that the user is only interested in events at a location Almaden related to database topics. Channel(Almaden-Calendar) and Keyword(database) are examples for basic predicates connected by the AND operator. An example of one such webcasting system may be found in U.S. patent application Ser. No. 08/978,737, by M. Eichstaedt et al, entitled “METHOD AND APPARATUS FOR EFFICIENT PROFILE MATCHING IN A LARGE SCALE WEBCASTING SYSTEM,” filed Nov. 26, 1997, assigned to the assignee of the present invention and incorporated herein by reference. This information delivery system matches information items from a stream of documents against a profile collection and sends a personalized stream of information to each user based upon the user's profile.
In one embodiment, the invention may be implemented to provide a method to partition a profile database and distribute each partition to a separate processor. The invention partitions a large collection of user profiles into smaller profile subsets and assigns those subsets to separate processing units based upon a subset's cost. A subset's cost is substantially the cumulative total of the individual costs for profile data contained in the subset. Profiles are partitioned in such a way that each subset can be processed individually without extensive communication with other subsets being processed.
The invention includes in one embodiment four steps to parallelize the profiles. First, an initial profile set is partitioned into several subsets (also referred to as sub-partitions) using various heuristic methods. Second, each sub-partition is mapped onto one or more independent processing units. Each processing unit is not required to have equal processing performance. However, for best performance results, subset data should be mapped in one embodiment where the subset with a highest cost is mapped to a fastest processor, and the next highest cost subset mapped to the next fastest processor. Where appropriate, the invention evaluates the relative subset processing speed of each processor and adjusts future subset mapping based upon these evaluations. For each information item I that needs to be matched with a profile predicate, a third and a fourth step are executed. The th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for parallel profile matching in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for parallel profile matching in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for parallel profile matching in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2540499

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.