System and method for personalized search, information...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06687696

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to data processing systems and more specifically to a personalized search engine that utilizes latent class models called aspect models or Probabilistic Latent Semantic Analysis or Indexing.
BACKGROUND OF THE INVENTION
The rapid growth of digital data storage and the overwhelming supply of on-line information provided by today's communication networks creates a risk of constant information overload. One of the key problems of modern information society is the increasing difficulty in accessing relevant information while suppressing the overwhelming mass of irrelevant data. Most importantly, the notion of relevance is highly individual and thus difficult to formalize in general terms. Information Filtering refers to the general problem of separating useful and important information from nuisance data. Individual users have different preferences, opinions, judgments, tastes, and cultural backgrounds. In order to support different individuals in their quest for information, an automated filtering system has to take into account the diversity of preferences and the inherent relativity of information value.
One commonly distinguishes between (at least) two major approaches to information filtering. The first approach is content-based filtering in which information organization is based on properties of the object or the carrier of information. The second approach is collaborative filtering (or social filtering), in which the preference-behavior and qualities of other persons are exploited in speculating about the preferences of a particular individual. Information Filtering technology had a huge impact on the development of the Internet and the e-commerce boom.
Search engines are classical content-based systems based on pattern matching technology to quickly retrieve information based on query keywords. Search engines are fast and have proven to be scalable to data sets of the size of the Internet, but they have inherent limitations. Due to the simple nature of direct pattern matching they provide no understanding of the sense of a word and the context of its use, often resulting in an unexpected variety of search results. These results are typically explained by word matching, but are meaningless on the intentional level. Search engines typically are unable to effectively process over-specific queries where keywords are closely related to actual document content, yet do not appear in the actual text. Search engines are also frequently non-personalized, i.e., they provide the same results, independent of the user history, and they have no effective mechanism to learn from user satisfaction.
At the other end of the spectrum, E-commerce sites use recommender systems to suggest products to their customers. The products can be recommended based on the top overall sellers on a site, or on the demographics of the customer, or an analysis of the past buying behavior of the customer as a prediction for future buying behavior. Recommender systems aim at personalization on the Web, enabling individual treatment of each customer. However, recommender systems and their underlying collaborative filtering techniques have several shortcomings. Poor predictions at best can be made for new objects and new users (“Early rater problem”). In many domains, only a small percentage of possible items receive ratings for a given user (“Scarcity problem”). There are often users with different interests and tastes than any other users in the database (“Grey sheep problem”). More fundamentally, while many users may share some common interest, it may be extremely difficult to find a sufficient number of users that share all interests.
Therefore, in light of the foregoing deficiencies in the prior art, the applicant's invention is herein presented.
SUMMARY OF THE INVENTION
The disclosed system provides a method for the personalized filtering of information and the automated generation of user-specific recommendations. The system goes through 3 phases: 1) Information Gathering, 2) System Learning, and 3) Information Retrieval. The disclosed system is concerned primarily with the final two phases (System Learning and Information Retrieval). In the Information Gathering phase, information about the data to be retrieved (DOCUMENT DATA) and about the individual users (USER DATA) is collected. The USER DATA can be gathered explicitly through questionnaires, etc. or can be implied though observing user behavior such as Internet history logs, demographic information, or any other relevant sources of information. The DOCUMENT DATA can be gathered though a variety of methods including Internet crawling, topic taxonomies or any other relevant source of information. Once the Information Gathering phase is completed, the System Learning phase is initiated. The system employs a statistical algorithm that uses available USER DATA and DOCUMENT DATA to create a statistical latent class model (MODEL), also known as Probabilistic Latent Semantic Analysis (PLSA). The system learns one or more MODELS based on the USER DATA, DOCUMENT DATA, and the available database containing data obtained from other users. Within the MODEL, probabilities for words extracted from training data are calculated and stored in at least one matrix. An extended inverted index may also be generated and stored along with the MODEL in order to facilitate more efficient information gathering. The MODEL may be used in other applications such as the unsupervised learning of topic hierarchies and for other data mining functions such as identifying user communities. Various parts of the Information Gathering phase and the System Learning phase are repeated from time to time in order to further refine or update the MODEL. This refined or updated model will result in even higher levels of accuracy in processing the user's query. The final phase is the Information Retrieval phase. The user may enter a query. Once the query is entered into the system, the MODEL is utilized in calculating probabilities for every word in a document based upon at least 1) the user query, or 2) words associated with the users query in the MODEL, or 3) document information. All of the probabilities for a given document are added together yielding a total relevance “score” after which related documents are compared using this relevance score. The results are returned in descending order of relevance organized into at least one result list. Through the use of the MODEL, the system provides two benefits to the user: 1) the search results are personalized as each MODEL may be created in part using USER DATA, and 2) results for new users are somewhat personalized from the initial use through collaborative filtering based upon USER DATA for other system users.


REFERENCES:
patent: 5278980 (1994-01-01), Pedersen et al.
patent: 5704017 (1997-12-01), Heckerman et al.
patent: 5724567 (1998-03-01), Rose et al.
patent: 5790426 (1998-08-01), Robinson
patent: 5790935 (1998-08-01), Payton
patent: 5867799 (1999-02-01), Lang et al.
patent: 5884282 (1999-03-01), Robinson
patent: 5918014 (1999-06-01), Robinson
patent: 5983214 (1999-11-01), Lang et al.
patent: 6006218 (1999-12-01), Breese et al.
patent: 6029141 (2000-02-01), Bezos et al.
patent: 6029195 (2000-02-01), Herz
patent: 6041311 (2000-03-01), Chislenko et al.
patent: 6049777 (2000-04-01), Sheena et al.
patent: 6064980 (2000-05-01), Jacobi et al.
patent: 6072942 (2000-06-01), Stockwell et al.
patent: 6078740 (2000-06-01), DeTreville
patent: 6138116 (2000-10-01), Kitagawa et al.
patent: 6493702 (2002-12-01), Adar et al.
patent: 6510406 (2003-01-01), Marchisio
T. Hofmann and J. Puzicha, Statistical Models for Co-occurrence DataTechnical Report 1625, MIT, 1998.
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman, Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science,1990.
T. Hofmann, Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization, Advances in Neural Inform

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for personalized search, information... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for personalized search, information..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for personalized search, information... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3306397

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.