Combining matrix factorization and LDA topic modeling for rating prediction and learning user interest profiles (slides)
Matrix Factorization through Latent Dirichlet Allocation (fLDA) is a generative model for concurrent rating prediction and topic/persona extraction. It learns topic structure of URLs and topic affinity vectors for users, and predicts ratings as well. The fLDA model achieves several goals for StumbleUpon in a single framework: it allows for unsupervised inference of latent topics in the URLs served to users and for users to be represented as mixtures over the same topics learned from the URLs (in the form of affinity vectors generated by the model).
In this talk, I will present an ongoing effort inspired by the fLDA framework devoted to extend to original approach to an industrial environment. The current implementation uses a (much faster) expectation maximization method for parameter estimation, instead of Gibbs sampling as in the original work and implements a modified version of in which topic distributions are learned independently using LDA prior to training the main model. This is an ongoing effort but we have very interesting results.
Debora Donato is Sr. Director of Personalization and Principal Data Scientist at StumbleUpon. Before moving to StumbleUpon, Debora was Senior Scientist at Yahoo! Labs. Her research interests include User Behavior Analysis, Recommendation Systems, Web Information Retrieval, Link Analysis, Algorithms for the Characterization of the Web, Complex Networks and Social Networks. Debora obtained a Ph.D. in Computer Engineering in 2005 from the University of Rome "La Sapienza". She has published more than 50 scientific papers and she has been serving on the program committee of top tier conferences in the area of Data Mining and Information Retrieval. She is coordinating R&D projects in the areas of User modeling, Content Understanding, Recommendation Algorithm. The main ongoing efforts are devoted to improve recommendation performances by leverage implicit user feedback, co-modeling users and content on the same (tag-based) dimensional space.