skip to main content

Title: Boosting Item-based Collaborative Filtering via Nearly Uncoupled Random Walks
Item-based models are among the most popular collaborative filtering approaches for building recommender systems. Random walks can provide a powerful tool for harvesting the rich network of interactions captured within these models. They can exploit indirect relations between the items, mitigate the effects of sparsity, ensure wider itemspace coverage, as well as increase the diversity of recommendation lists. Their potential however, can be hindered by the tendency of the walks to rapidly concentrate towards the central nodes of the graph, thereby significantly restricting the range of K -step distributions that can be exploited for personalized recommendations. In this work, we introduce RecWalk ; a novel random walk-based method that leverages the spectral properties of nearly uncoupled Markov chains to provably lift this limitation and prolong the influence of users’ past preferences on the successive steps of the walk—thereby allowing the walker to explore the underlying network more fruitfully. A comprehensive set of experiments on real-world datasets verify the theoretically predicted properties of the proposed approach and indicate that they are directly linked to significant improvements in top- n recommendation accuracy. They also highlight RecWalk’s potential in providing a framework for boosting the performance of item-based models. RecWalk achieves state-of-the-art top- more » n recommendation quality outperforming several competing approaches, including recently proposed methods that rely on deep neural networks. « less
Award ID(s):
Publication Date:
Journal Name:
ACM Transactions on Knowledge Discovery from Data
Page Range or eLocation-ID:
1 to 26
Sponsoring Org:
National Science Foundation
More Like this
  1. Finding the reduced-dimensional structure is critical to understanding complex networks. Existing approaches such as spectral clustering are applicable only when the full network is explicitly observed. In this paper, we focus on the online factorization and partition of implicit large lumpable networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm (GHA). The algorithm is able to process random walk data in a streaming fashion and learn a low-dimensional representation for each vertex. By applying a diffusion approximation analysis, we show that the continuous-time limiting process of the stochastic algorithm converges globally to the “principal components” of the Markov chain. We also establish a finite-sample error bound that matches the nonimprovable state-of-art result for online factorization. Once learned the low-dimensional state representations, we further apply clustering techniques to recover the network partition. We show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability given sufficient data. We apply the proposed approach to model the traffic flow of Manhattan as city-wide random walks. By using our algorithm to analyze the taxi trip data, we discovermore »a latent partition of the Manhattan city that closely matches the traffic dynamics.« less
  2. Abstract Identification of influential nodes is an important step in understanding and controlling the dynamics of information, traffic, and spreading processes in networks. As a result, a number of centrality measures have been proposed and used across different application domains. At the heart of many of these measures lies an assumption describing the manner in which traffic (of information, social actors, particles, etc.) flows through the network. For example, some measures only count shortest paths while others consider random walks. This paper considers a spreading process in which a resource necessary for transit is partially consumed along the way while being refilled at special nodes on the network. Examples include fuel consumption of vehicles together with refueling stations, information loss during dissemination with error-correcting nodes, and consumption of ammunition of military troops while moving. We propose generalizations of the well-known measures of betweenness, random-walk betweenness, and Katz centralities to take such a spreading process with consumable resources into account. In order to validate the results, experiments on real-world networks are carried out by developing simulations based on well-known models such as Susceptible-Infected-Recovered and congestion with respect to particle hopping from vehicular flow theory. The simulation-based models are shown to bemore »highly correlated with the proposed centrality measures. Reproducibility: Our code and experiments are available at« less
  3. Despite advances in deep learning methods for song recommendation, most existing methods do not take advantage of the sequential nature of song content. In addition, there is a lack of methods that can explain their predictions using the content of recommended songs and only a few approaches can handle the item cold start problem. In this work, we propose a hybrid deep learning model that uses collaborative filtering (CF) and deep learning sequence models on the Musical Instrument Digital Interface (MIDI) content of songs to provide accurate recommendations, while also being able to generate a relevant, personalized explanation for each recommended song. Compared to state-of-the-art methods, our validation experiments showed that in addition to generating explainable recommendations, our model stood out among the top performers in terms of recommendation accuracy and the ability to handle the item cold start problem. Moreover, validation shows that our personalized explanations capture properties that are in accordance with the user’s preferences.
  4. Existing learning to rank models for information retrieval are trained based on explicit or implicit query-document relevance information. In this paper, we study the task of learning a retrieval model based on user-item interactions. Our model has potential applications to the systems with rich user-item interaction data, such as browsing and recommendation, in which having an accurate search engine is desired. This includes media streaming services and e-commerce websites among others. Inspired by the neural approaches to collaborative filtering and the language modeling approaches to information retrieval, our model is jointly optimized to predict user-item interactions and reconstruct the item textual descriptions. In more details, our model learns user and item representations such that they can accurately predict future user-item interactions, while generating an effective unigram language model for each item. Our experiments on four diverse datasets in the context of movie and product search and recommendation demonstrate that our model substantially outperforms competitive retrieval baselines, in addition to providing comparable performance to state-of-the-art hybrid recommendation models.
  5. For a graph G on n vertices, naively sampling the position of a random walk of at time t requires work Ω(t). We desire local access algorithms supporting positionG(t) queries, which return the position of a random walk from some fixed start vertex s at time t, where the joint distribution of returned positions is 1/ poly(n) close to those of a uniformly random walk in ℓ1 distance. We first give an algorithm for local access to random walks on a given undirected d-regular graph with eO( 1 1−λ √ n) runtime per query, where λ is the second-largest eigenvalue of the random walk matrix of the graph in absolute value. Since random d-regular graphs G(n, d) are expanders with high probability, this gives an eO(√ n) algorithm for a graph drawn from G(n, d) whp, which improves on the naive method for small numbers of queries. We then prove that no algorithm with subconstant error given probe access to an input d-regular graph can have runtime better than Ω(√ n/ log(n)) per query in expectation when the input graph is drawn from G(n, d), obtaining a nearly matching lower bound. We further show an Ω(n1/4) runtime per query lowermore »bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance). We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on small-degree abelian Cayley graphs, including cycles and hypercubes, with runtime polylog(n) per query. This also allows for efficient local access to walks on polylog degree expanders. We show that our techniques apply to graphs with high degree by extending or results to graphs constructed using the tensor product (giving fast local access to walks on degree nϵ graphs for any ϵ ∈ (0, 1]) and Cartesian product.« less