Monte Carlo Estimates of Evaluation Metric Error and Bias
Traditional offline evaluations of recommender systems apply metrics from machine learning and information retrieval in settings where their underlying assumptions no longer hold. This results in significant error and bias in measures of top-N recommendation performance, such as precision, recall, and nDCG. Several of the specific causes of these errors, including popularity bias and misclassified decoy items, are well-explored in the existing literature. In this paper we survey a range of work on identifying and addressing these problems, and report on our work in progress to simulate the recommender data generation and evaluation processes to quantify the extent of evaluation metric errors and assess their sensitivity to various assumptions.
- Award ID(s):
- 1751278
- Publication Date:
- NSF-PAR ID:
- 10074452
- Journal Name:
- REVEAL 2018 Workshop on Offline Evaluation in Recommender Systems
- Sponsoring Org:
- National Science Foundation
More Like this
-
Meila, Marina ; Zhang, Tong (Ed.)Incorporating graph side information into recommender systems has been widely used to better predict ratings, but relatively few works have focused on theoretical guarantees. Ahn et al. (2018) firstly characterized the optimal sample complexity in the presence of graph side information, but the results are limited due to strict, unrealistic assumptions made on the unknown latent preference matrix and the structure of user clusters. In this work, we propose a new model in which 1) the unknown latent preference matrix can have any discrete values, and 2) users can be clustered into multiple clusters, thereby relaxing the assumptions made inmore »
-
Abstract. The evaluation of aerosol radiative effect on broadband hemispherical solar flux is often performed using simplified spectral and directional scattering characteristics of atmospheric aerosol and underlying surface reflectance. In this study we present a rigorous yet fast computational tool that accurately accounts for detailed variability of both spectral and angular scattering properties of aerosol and surface reflectance in calculation of direct aerosol radiative effect. The tool is developed as part of the GRASP (Generalized Retrieval of Aerosol and Surface Properties) project. We use the tool to evaluate instantaneous and daily average radiative efficiencies (radiative effect per unit aerosol opticalmore »
-
Abstract This paper investigates the ability of the Weather Research and Forecasting (WRF) Model in simulating multiple small-scale precipitation bands (multibands) within the extratropical cyclone comma head using four winter storm cases from 2014 to 2017. Using the model output, some physical processes are explored to investigate band prediction. A 40-member WRF ensemble was constructed down to 2-km grid spacing over the Northeast United States using different physics, stochastic physics perturbations, different initial/boundary conditions from the first five perturbed members of the Global Forecast System (GFS) Ensemble Reforecast (GEFSR), and a stochastic kinetic energy backscatter scheme (SKEBS). It was foundmore »
-
Recent work in recommender systems has emphasized the importance of fairness, with a particular interest in bias and transparency, in addition to predictive accuracy. In this paper, we focus on the state of the art pairwise ranking model, Bayesian Personalized Ranking (BPR), which has previously been found to outperform pointwise models in predictive accuracy, while also being able to handle implicit feedback. Specifically, we address two limitations of BPR: (1) BPR is a black box model that does not explain its outputs, thus limiting the user's trust in the recommendations, and the analyst's ability to scrutinize a model's outputs; andmore »
-
Summary Instrumental variable methods can identify causal effects even when the treatment and outcome are confounded. We study the problem of imperfect measurements of the binary instrumental variable, treatment and outcome. We first consider nondifferential measurement errors, that is, the mismeasured variable does not depend on other variables given its true value. We show that the measurement error of the instrumental variable does not bias the estimate, that the measurement error of the treatment biases the estimate away from zero, and that the measurement error of the outcome biases the estimate toward zero. Moreover, we derive sharp bounds on themore »