skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning With Heterogeneous Misspecified Models: Characterization and Robustness
This paper develops a general framework to study how misinterpreting information impacts learning. Our main result is a simple criterion to characterize long‐run beliefs based on the underlying form of misspecification. We present this characterization in the context of social learning, then highlight how it applies to other learning environments, including individual learning. A key contribution is that our characterization applies to settings with model heterogeneity and provides conditions for entrenched disagreement. Our characterization can be used to determine whether a representative agent approach is valid in the face of heterogeneity, study how differing levels of bias or unawareness of others' biases impact learning, and explore whether the impact of a bias is sensitive to parametric specification or the source of information. This unified framework synthesizes insights gleaned from previously studied forms of misspecification and provides novel insights in specific applications, as we demonstrate in settings with partisan bias, overreaction, naive learning, and level‐k reasoning.  more » « less
Award ID(s):
1851629
PAR ID:
10332415
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Econometrica
Volume:
89
Issue:
6
ISSN:
0012-9682
Page Range / eLocation ID:
3025 to 3077
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bapna, Ravi; Burtch, Gordon (Ed.)
    Machine learning is commonly used to estimate the heterogeneous treatment effects (HTEs) in randomized experiments. Using large-scale randomized experiments on the Facebook and Criteo platforms, we observe substantial discrepancies between machine learning-based treatment effect estimates and difference-in-means estimates directly from the randomized experiment. This paper provides a two-step framework for practitioners and researchers to diagnose and rectify this discrepancy. We first introduce a diagnostic tool to assess whether bias exists in the model-based estimates from machine learning. If bias exists, we then offer a model-agnostic method to calibrate any HTE estimates to known, unbiased, subgroup difference-in-means estimates, ensuring that the sign and magnitude of the subgroup estimates approximate the model-free benchmarks. This calibration method requires no additional data and can be scaled for large data sets. To highlight potential sources of bias, we theoretically show that this bias can result from regularization and further use synthetic simulation to show biases result from misspecification and high-dimensional features. We demonstrate the efficacy of our calibration method using extensive synthetic simulations and two real-world randomized experiments. We further demonstrate the practical value of this calibration in three typical policy-making settings: a prescriptive, budget-constrained optimization framework; a setting seeking to maximize multiple performance indicators; and a multitreatment uplift modeling setting. 
    more » « less
  2. This study provides a template for multisite causal mediation analysis using a comprehensive weighting-based analytic procedure that enhances external and internal validity. The template incorporates a sample weight to adjust for complex sample and survey designs, adopts an IPTW weight to adjust for differential treatment assignment probabilities, employs an estimated nonresponse weight to account for non-random nonresponse, and utilizes a propensity score-based weighting strategy to flexibly decompose not only the population average but also the between-site heterogeneity of the total program impact. Because the identification assumptions are not always warranted, a weighting-based balance checking procedure assesses the remaining overt bias, while a weighting-based sensitivity analysis further evaluates the potential bias related to omitted confounding or to propensity score model misspecification. We derive the asymptotic variance of the estimators for the causal effects that account for the sampling uncertainty in the estimated weights. The method is applied to a re-analysis of the data from the National Job Corps Study. 
    more » « less
  3. Abstract MotivationGenomic data are subject to various sources of confounding, such as demographic variables, biological heterogeneity, and batch effects. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction. ResultsThis study shows that the traditional approach is suboptimal and proposes a new two-dimensional false discovery rate control framework (2DFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2DFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2DFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. Promising finite sample performance is demonstrated via extensive simulations and real data applications. Availability and implementationR codes and vignettes are available at https://github.com/asmita112358/tdfdr.np. 
    more » « less
  4. Inference in semi-supervised (SS) settings has gained substantial attention in recent years due to increased relevance in modern big-data problems. In a typical SS setting, there is a much larger-sized unlabeled data, containing only observations of predictors, and a moderately sized labeled data containing observations for both an outcome and the set of predictors. Such data naturally arises when the outcome, unlike the predictors, is costly or difficult to obtain. One of the primary statistical objectives in SS settings is to explore whether parameter estimation can be improved by exploiting the unlabeled data. We propose a novel Bayesian method for estimating the population mean in SS settings. The approach yields estimators that are both efficient and optimal for estimation and inference. The method itself has several interesting artifacts. The central idea behind the method is to model certain summary statistics of the data in a targeted manner, rather than the entire raw data itself, along with a novel Bayesian notion of debiasing. Specifying appropriate summary statistics crucially relies on a debiased representation of the population mean that incorporates unlabeled data through a flexible nuisance function while also learning its estimation bias. Combined with careful usage of sample splitting, this debiasing approach mitigates the effect of bias due to slow rates or misspecification of the nuisance parameter from the posterior of the final parameter of interest, ensuring its robustness and efficiency. Concrete theoretical results, via Bernstein--von Mises theorems, are established, validating all claims, and are further supported through extensive numerical studies. To our knowledge, this is possibly the first work on Bayesian inference in SS settings, and its central ideas also apply more broadly to other Bayesian semi-parametric inference problems. 
    more » « less
  5. This article is an exploratory analysis of the impact of the California Consumer Privacy Act (CCPA) on data breaches that result in exposing sensitive private data of consumers. The CCPA applies to large for-profit businesses that collect and disseminate personal information of Californian consumers. It provides for consumer rights and imposes notification and security requirements on businesses that collect private information. We analyzed how CCPA affects data breach notifications that are required by the state's Office of Auditor General, for the period 2012 to 2023. The analysis provides interesting insights into the impact of CCPA on the pattern of data breaches. Our principal finding is that privacy breaches reduced to some extent after CCPA. Importantly, CCPA has helped in the overall improvement in reporting privacy breaches. We surmise that the CCPA brought more data breaches into light. 
    more » « less