skip to main content


Title: Boosting Item-based Collaborative Filtering via Nearly Uncoupled Random Walks
Item-based models are among the most popular collaborative filtering approaches for building recommender systems. Random walks can provide a powerful tool for harvesting the rich network of interactions captured within these models. They can exploit indirect relations between the items, mitigate the effects of sparsity, ensure wider itemspace coverage, as well as increase the diversity of recommendation lists. Their potential however, can be hindered by the tendency of the walks to rapidly concentrate towards the central nodes of the graph, thereby significantly restricting the range of K -step distributions that can be exploited for personalized recommendations. In this work, we introduce RecWalk ; a novel random walk-based method that leverages the spectral properties of nearly uncoupled Markov chains to provably lift this limitation and prolong the influence of users’ past preferences on the successive steps of the walk—thereby allowing the walker to explore the underlying network more fruitfully. A comprehensive set of experiments on real-world datasets verify the theoretically predicted properties of the proposed approach and indicate that they are directly linked to significant improvements in top- n recommendation accuracy. They also highlight RecWalk’s potential in providing a framework for boosting the performance of item-based models. RecWalk achieves state-of-the-art top- n recommendation quality outperforming several competing approaches, including recently proposed methods that rely on deep neural networks.  more » « less
Award ID(s):
1704074
NSF-PAR ID:
10291119
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ACM Transactions on Knowledge Discovery from Data
Volume:
14
Issue:
6
ISSN:
1556-4681
Page Range / eLocation ID:
1 to 26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Finding the reduced-dimensional structure is critical to understanding complex networks. Existing approaches such as spectral clustering are applicable only when the full network is explicitly observed. In this paper, we focus on the online factorization and partition of implicit large lumpable networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm (GHA). The algorithm is able to process random walk data in a streaming fashion and learn a low-dimensional representation for each vertex. By applying a diffusion approximation analysis, we show that the continuous-time limiting process of the stochastic algorithm converges globally to the “principal components” of the Markov chain. We also establish a finite-sample error bound that matches the nonimprovable state-of-art result for online factorization. Once learned the low-dimensional state representations, we further apply clustering techniques to recover the network partition. We show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability given sufficient data. We apply the proposed approach to model the traffic flow of Manhattan as city-wide random walks. By using our algorithm to analyze the taxi trip data, we discover a latent partition of the Manhattan city that closely matches the traffic dynamics. 
    more » « less
  2. Abstract We investigate the statistical learning of nodal attribute functionals in homophily networks using random walks. Attributes can be discrete or continuous. A generalization of various existing canonical models, based on preferential attachment is studied (model class $$\mathscr {P}$$ P ), where new nodes form connections dependent on both their attribute values and popularity as measured by degree. An associated model class $$\mathscr {U}$$ U is described, which is amenable to theoretical analysis and gives access to asymptotics of a host of functionals of interest. Settings where asymptotics for model class $$\mathscr {U}$$ U transfer over to model class $$\mathscr {P}$$ P through the phenomenon of resolvability are analyzed. For the statistical learning, we consider several canonical attribute agnostic sampling schemes such as Metropolis-Hasting random walk, versions of node2vec (Grover and Leskovec, 2016) that incorporate both classical random walk and non-backtracking propensities and propose new variants which use attribute information in addition to topological information to explore the network. Estimators for learning the attribute distribution, degree distribution for an attribute type and homophily measures are proposed. The performance of such statistical learning framework is studied on both synthetic networks (model class $$\mathscr {P}$$ P ) and real world systems, and its dependence on the network topology, degree of homophily or absence thereof, (un)balanced attributes, is assessed. 
    more » « less
  3. Abstract Identification of influential nodes is an important step in understanding and controlling the dynamics of information, traffic, and spreading processes in networks. As a result, a number of centrality measures have been proposed and used across different application domains. At the heart of many of these measures lies an assumption describing the manner in which traffic (of information, social actors, particles, etc.) flows through the network. For example, some measures only count shortest paths while others consider random walks. This paper considers a spreading process in which a resource necessary for transit is partially consumed along the way while being refilled at special nodes on the network. Examples include fuel consumption of vehicles together with refueling stations, information loss during dissemination with error-correcting nodes, and consumption of ammunition of military troops while moving. We propose generalizations of the well-known measures of betweenness, random-walk betweenness, and Katz centralities to take such a spreading process with consumable resources into account. In order to validate the results, experiments on real-world networks are carried out by developing simulations based on well-known models such as Susceptible-Infected-Recovered and congestion with respect to particle hopping from vehicular flow theory. The simulation-based models are shown to be highly correlated with the proposed centrality measures. Reproducibility: Our code and experiments are available at https://github.com/hmwesigwa/soc_centrality 
    more » « less
  4. Cherifi, H. ; Mantegna, R.N. ; Rocha, L.M. ; Cherifi, C. ; Micciche, S. (Ed.)
    We investigate the statistical learning of nodal attribute distributions in homophily networks using random walks. Attributes can be discrete or continuous. A generalization of various existing canonical models, based on preferential attachment is studied, where new nodes form connections dependent on both their attribute values and popularity as measured by degree. We consider several canonical attribute agnostic sampling schemes such as Metropolis-Hasting random walk, versions of node2vec (Grover and Leskovec 2016) that incorporate both classical random walk and non-backtracking propensities and propose new variants which use attribute information in addition to topological information to explore the network. The performance of such algorithms is studied on both synthetic networks and real world systems, and its dependence on the degree of homophily, or absence thereof, is assessed. 
    more » « less
  5. The problem of rank aggregation from pairwise and multiway comparisons has a wide range of implications, ranging from recommendation systems to sports rankings to social choice. Some of the most popular algorithms for this problem come from the class of spectral ranking algorithms; these include the rank centrality (RC) algorithm for pairwise comparisons, which returns consistent estimates under the Bradley-Terry-Luce (BTL) model for pairwise comparisons (Negahban et al., 2017), and its generalization, the Luce spectral ranking (LSR) algorithm, which returns consistent estimates under the more general multinomial logit (MNL) model for multiway comparisons (Maystre & Grossglauser, 2015). In this paper, we design a provably faster spectral ranking algorithm, which we call accelerated spectral ranking (ASR), that is also consistent under the MNL/BTL models. Our accelerated algorithm is achieved by designing a random walk that has a faster mixing time than the random walks associated with previous algorithms. In addition to a faster algorithm, our results yield improved sample complexity bounds for recovery of the MNL/BTL parameters: to the best of our knowledge, we give the first general sample complexity bounds for recovering the parameters of the MNL model from multiway comparisons under any (connected) comparison graph (and improve significantly over previous bounds for the BTL model for pairwise comparisons). We also give a message-passing interpretation of our algorithm, which suggests a decentralized distributed implementation. Our experiments on several real-world and synthetic datasets confirm that our new ASR algorithm is indeed orders of magnitude faster than existing algorithms. 
    more » « less