skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning With Heterogeneous Misspecified Models: Characterization and Robustness
This paper develops a general framework to study how misinterpreting information impacts learning. Our main result is a simple criterion to characterize long‐run beliefs based on the underlying form of misspecification. We present this characterization in the context of social learning, then highlight how it applies to other learning environments, including individual learning. A key contribution is that our characterization applies to settings with model heterogeneity and provides conditions for entrenched disagreement. Our characterization can be used to determine whether a representative agent approach is valid in the face of heterogeneity, study how differing levels of bias or unawareness of others' biases impact learning, and explore whether the impact of a bias is sensitive to parametric specification or the source of information. This unified framework synthesizes insights gleaned from previously studied forms of misspecification and provides novel insights in specific applications, as we demonstrate in settings with partisan bias, overreaction, naive learning, and level‐k reasoning.  more » « less
Award ID(s):
1851629
PAR ID:
10332415
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Econometrica
Volume:
89
Issue:
6
ISSN:
0012-9682
Page Range / eLocation ID:
3025 to 3077
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bapna, Ravi; Burtch, Gordon (Ed.)
    Machine learning is commonly used to estimate the heterogeneous treatment effects (HTEs) in randomized experiments. Using large-scale randomized experiments on the Facebook and Criteo platforms, we observe substantial discrepancies between machine learning-based treatment effect estimates and difference-in-means estimates directly from the randomized experiment. This paper provides a two-step framework for practitioners and researchers to diagnose and rectify this discrepancy. We first introduce a diagnostic tool to assess whether bias exists in the model-based estimates from machine learning. If bias exists, we then offer a model-agnostic method to calibrate any HTE estimates to known, unbiased, subgroup difference-in-means estimates, ensuring that the sign and magnitude of the subgroup estimates approximate the model-free benchmarks. This calibration method requires no additional data and can be scaled for large data sets. To highlight potential sources of bias, we theoretically show that this bias can result from regularization and further use synthetic simulation to show biases result from misspecification and high-dimensional features. We demonstrate the efficacy of our calibration method using extensive synthetic simulations and two real-world randomized experiments. We further demonstrate the practical value of this calibration in three typical policy-making settings: a prescriptive, budget-constrained optimization framework; a setting seeking to maximize multiple performance indicators; and a multitreatment uplift modeling setting. 
    more » « less
  2. This study provides a template for multisite causal mediation analysis using a comprehensive weighting-based analytic procedure that enhances external and internal validity. The template incorporates a sample weight to adjust for complex sample and survey designs, adopts an IPTW weight to adjust for differential treatment assignment probabilities, employs an estimated nonresponse weight to account for non-random nonresponse, and utilizes a propensity score-based weighting strategy to flexibly decompose not only the population average but also the between-site heterogeneity of the total program impact. Because the identification assumptions are not always warranted, a weighting-based balance checking procedure assesses the remaining overt bias, while a weighting-based sensitivity analysis further evaluates the potential bias related to omitted confounding or to propensity score model misspecification. We derive the asymptotic variance of the estimators for the causal effects that account for the sampling uncertainty in the estimated weights. The method is applied to a re-analysis of the data from the National Job Corps Study. 
    more » « less
  3. Abstract MotivationGenomic data are subject to various sources of confounding, such as demographic variables, biological heterogeneity, and batch effects. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction. ResultsThis study shows that the traditional approach is suboptimal and proposes a new two-dimensional false discovery rate control framework (2DFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2DFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2DFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. Promising finite sample performance is demonstrated via extensive simulations and real data applications. Availability and implementationR codes and vignettes are available at https://github.com/asmita112358/tdfdr.np. 
    more » « less
  4. This article is an exploratory analysis of the impact of the California Consumer Privacy Act (CCPA) on data breaches that result in exposing sensitive private data of consumers. The CCPA applies to large for-profit businesses that collect and disseminate personal information of Californian consumers. It provides for consumer rights and imposes notification and security requirements on businesses that collect private information. We analyzed how CCPA affects data breach notifications that are required by the state's Office of Auditor General, for the period 2012 to 2023. The analysis provides interesting insights into the impact of CCPA on the pattern of data breaches. Our principal finding is that privacy breaches reduced to some extent after CCPA. Importantly, CCPA has helped in the overall improvement in reporting privacy breaches. We surmise that the CCPA brought more data breaches into light. 
    more » « less
  5. null (Ed.)
    Understanding and integrating the user’s decision-making process into product design and distribution strategies is likely to lead to higher adoption rates and ultimately increased impacts, particularly for those products that require a change in habit or behavior such as clean energy technologies. This study applies the Theory of Planned Behavior (TPB) in design for global development, where understanding the tendency to adopt beneficial technologies based on parsimonious approaches is critical to programmatic impact. To investigate robustness and applicability of behavioral models in a data scarce setting, this study applies TPB to the adoption of biomass cookstoves in a sample size of two remote communities in Honduras and Uganda before and after a trial period. Using multiple ordinal logistic regressions, the intention to adopt the technology was modeled. Results quantify the influence of these factors on households’ intentions to cook their main meals with improved cookstoves. For example, the intention of participants with slightly stronger beliefs regarding the importance of reducing smoke emissions was 3.3 times higher than average to cook more main meals with clean cookstoves. The quantitative method of this study enables technology designers to design and develop clean technologies that better suit user behavior, needs, and priorities. In addition, the data driven approach of this study provides insights for policy makers to design policies such as subsidies, information campaigns, and supply chains that reflect behavioral attributes for culturally tailored clean technology adoption initiatives. Furthermore, this work discusses potential sources of bias and statistical challenges in data-scarce regions, and outlines methods to address them. 
    more » « less