skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Augmented degree correction for bipartite networks with applications to recommender systems
Abstract In recommender systems, users rate items, and are subsequently served other product recommendations based on these ratings. Even though users usually rate a tiny percentage of the available items, the system tries to estimate unobserved preferences by finding similarities across users and across items. In this work, we treat the observed ratings data as partially observed, dense, weighted, bipartite networks. For a class of systems without outside information, we adapt an approach developed for dense, weighted networks to account for unobserved edges and the bipartite nature of the problem. The approach begins with clustering both users and items into communities, and locally estimates the patterns of ratings within each subnetwork induced by restricting attention to one community of users and one community of items community. The local fitting procedure relies on estimating local sociability parameters for every user and item, and selecting the function that determines the degree correction contours which best models the underlying data. We compare the performance of our proposed approach to existing methods on a simulated data set, as well as on a data set of joke ratings, examining model performance in both cases at differing levels of sparsity. On the joke ratings data set, our proposed model performs better than existing alternatives in relatively sparse settings, though other approaches achieve better results when more data is available. Collectively, the results indicate that despite struggling to pick up subtler signals, the proposed approach’s recovery of large scale, coarse patterns may still be useful in practical settings where high sparsity is typical.  more » « less
Award ID(s):
2113662 2134107
PAR ID:
10512205
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Applied Network Science
Volume:
9
Issue:
1
ISSN:
2364-8228
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Matrix factorization (MF) approximates unobserved ratings in a rating matrix, whose rows correspond to users and columns correspond to items to be rated, and has been serving as a fundamental building block in recommendation systems. This paper comprehensively studies the problem of matrix factorization in different federated learning (FL) settings, where a set of parties want to cooperate in training but refuse to share data directly. We first propose a generic algorithmic framework for various settings of federated matrix factorization (FMF) and provide a theoretical convergence guarantee. We then systematically characterize privacy-leakage risks in data collection, training, and publishing stages for three different settings and introduce privacy notions to provide end-to-end privacy protections. The first one is vertical federated learning (VFL), where multiple parties have the ratings from the same set of users but on disjoint sets of items. The second one is horizontal federated learning (HFL), where parties have ratings from different sets of users but on the same set of items. The third setting is local federated learning (LFL), where the ratings of the users are only stored on their local devices. We introduce adapted versions of FMF with the privacy notions guaranteed in the three settings. In particular, a new private learning technique called embedding clipping is introduced and used in all the three settings to ensure differential privacy. For the LFL setting, we combine differential privacy with secure aggregation to protect the communication between user devices and the server with a strength similar to the local differential privacy model, but much better accuracy. We perform experiments to demonstrate the effectiveness of our approaches. 
    more » « less
  2. Bipartite graphs are a powerful tool for modeling the interactions between two distinct groups. These bipartite relationships often feature small, recurring structural patterns called motifs which are building blocks for community structure. One promising structure is the induced 6-cycle which consists of three nodes on each node set forming a cycle where each node has exactly two edges. In this paper, we study the problem of counting and utilizing induced 6-cycles in large bipartite networks. We first consider two adaptations inspired by previous works for cycle counting in bipartite networks. Then, we introduce a new approach for node triplets which offer a systematic way to count the induced 6-cycles, used in BATCHTRIPLETJOIN. Our experimental evaluation shows that BATCHTRIPLETJOIN is significantly faster than the other algorithms while being scalable to large graph sizes and number of cores. On a network with 112M edges, BATCHTRIPLETJOIN is able to finish the computation in 78 mins by using 52 threads. In addition, we provide a new way to identify anomalous node triplets by comparing and contrasting the butterfly and induced 6-cycle counts of the nodes. We showcase several case studies on real-world networks from Amazon Kindle ratings, Steam game reviews, and Yelp ratings. 
    more » « less
  3. null (Ed.)
    Collaborative filtering has been widely used in recommender systems. Existing work has primarily focused on improving the prediction accuracy mainly via either building refined models or incorporating additional side information, yet has largely ignored the inherent distribution of the input rating data. In this paper, we propose a data debugging framework to identify overly personalized ratings whose existence degrades the performance of a given collaborative filtering model. The key idea of the proposed approach is to search for a small set of ratings whose editing (e.g., modification or deletion) would near-optimally improve the recommendation accuracy of a validation set. Experimental results demonstrate that the proposed approach can significantly improve the recommendation accuracy. Furthermore, we observe that the identified ratings significantly deviate from the average ratings of the corresponding items, and the proposed approach tends to modify them towards the average. This result sheds light on the design of future recommender systems in terms of balancing between the overall accuracy and personalization. 
    more » « less
  4. We propose a new time-dependent predictive model of user-item ratings centered around local coherence -- that is, while both users and items are constantly in flux, within a short-term sequence, the neighborhood of a particular user or item is likely to be coherent. Three unique characteristics of the framework are: (i) it incorporates both implicit and explicit feedbacks by extracting the local coherence hidden in the feedback sequences; (ii) it uses parallel recurrent neural networks to capture the evolution of users and items, resulting in a dual factor recommendation model; and (iii) it combines both coherence-enhanced consistent latent factors and dynamic latent factors to balance short-term changes with long-term trends for improved recommendation. Through experiments on Goodreads and Amazon, we find that the proposed model can outperform state-of-the-art models in predicting users' preferences. 
    more » « less
  5. null (Ed.)
    Collaborative filtering algorithms find useful patterns in rating and consumption data and exploit these patterns to guide users to good items. Many of these patterns reflect important real-world phenomena driving interactions between the various users and items; other patterns may be irrelevant or reflect undesired discrimination, such as discrimination in publishing or purchasing against authors who are women or ethnic minorities. In this work, we examine the response of collaborative filtering recommender algorithms to the distribution of their input data with respect to one dimension of social concern, namely content creator gender. Using publicly available book ratings data, we measure the distribution of the genders of the authors of books in user rating profiles and recommendation lists produced from this data. We find that common collaborative filtering algorithms tend to propagate at least some of each user’s tendency to rate or read male or female authors into their resulting recommendations, although they differ in both the strength of this propagation and the variance in the gender balance of the recommendation lists they produce. The data, experimental design, and statistical methods are designed to be reusable for studying potentially discriminatory social dimensions of recommendations in other domains and settings as well. 
    more » « less