Multinomial choice models are fundamental for empirical modeling of economic choices among discrete alternatives. We analyze identification of binary and multinomial choice models when the choice utilities are nonseparable in observed attributes and multidimensional unobserved heterogeneity with cross-section and panel data. We show that derivatives of choice probabilities with respect to continuous attributes are weighted averages of utility derivatives in cross-section models with exogenous heterogeneity. In the special case of random coefficient models with an independent additive effect, we further characterize that the probability derivative at zero is proportional to the population mean of the coefficients. We extend the identification results to models with endogenous heterogeneity using either a control function or panel data. In time stationary panel models with two periods, we find that differences over time of derivatives of choice probabilities identify utility derivatives “on the diagonal,” i.e. when the observed attributes take the same values in the two periods. We also show that time stationarity does not identify structural derivatives “off the diagonal” both in continuous and multinomial choice panel models.
more »
« less
Discretizing Unobserved Heterogeneity
We study discrete panel data methods where unobserved heterogeneity is revealed in a first step, in environments where population heterogeneity is not discrete. We focus on two‐step grouped fixed‐effects (GFE) estimators, where individuals are first classified into groups using kmeans clustering, and the model is then estimated allowing for group‐specific heterogeneity. Our framework relies on two key properties: heterogeneity is a function—possibly nonlinear and time‐varying—of a low‐dimensional continuous latent type, and informative moments are available for classification. We illustrate the method in a model of wages and labor market participation, and in a probit model with time‐varying heterogeneity. We derive asymptotic expansions of two‐step GFE estimators as the number of groups grows with the two dimensions of the panel. We propose a data‐driven rule for the number of groups, and discuss bias reduction and inference.
more »
« less
- Award ID(s):
- 1817476
- PAR ID:
- 10374171
- Date Published:
- Journal Name:
- Econometrica
- Volume:
- 90
- Issue:
- 2
- ISSN:
- 0012-9682
- Page Range / eLocation ID:
- 625 to 643
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Statistical analysis of longitudinal data often involves modeling treatment effects on clinically relevant longitudinal biomarkers since an initial event (the time origin). In some studies including preventive HIV vaccine efficacy trials, some participants have biomarkers measured starting at the time origin, whereas others have biomarkers measured starting later with the time origin unknown. The semiparametric additive time-varying coefficient model is investigated where the effects of some covariates vary nonparametrically with time while the effects of others remain constant. Weighted profile least squares estimators coupled with kernel smoothing are developed. The method uses the expectation maximization approach to deal with the censored time origin. The Kaplan–Meier estimator and other failure time regression models such as the Cox model can be utilized to estimate the distribution and the conditional distribution of left censored event time related to the censored time origin. Asymptotic properties of the parametric and nonparametric estimators and consistent asymptotic variance estimators are derived. A two-stage estimation procedure for choosing weight is proposed to improve estimation efficiency. Numerical simulations are conducted to examine finite sample properties of the proposed estimators. The simulation results show that the theory and methods work well. The efficiency gain of the two-stage estimation procedure depends on the distribution of the longitudinal error processes. The method is applied to analyze data from the Merck 023/HVTN 502 Step HIV vaccine study.more » « less
-
In modern machine learning, users often have to collaborate to learn distributions that generate the data. Communication can be a significant bottleneck. Prior work has studied homogeneous users—i.e., whose data follow the same discrete distribution—and has provided optimal communication-efficient methods. How- ever, these methods rely heavily on homogeneity, and are less applicable in the common case when users’ discrete distributions are heterogeneous. Here we consider a natural and tractable model of heterogeneity, where users’ discrete distributions only vary sparsely, on a small number of entries. We propose a novel two-stage method named SHIFT: First, the users collaborate by communicating with the server to learn a central distribution; relying on methods from robust statistics. Then, the learned central distribution is fine-tuned to estimate the indi- vidual distributions of users. We show that our method is minimax optimal in our model of heterogeneity and under communication constraints. Further, we provide experimental results using both synthetic data and n-gram frequency estimation in the text domain, which corroborate its efficiency.more » « less
-
Federated Learning (FL) under distributed concept drift is a largely unexplored area. Although concept drift is itself a well-studied phenomenon, it poses particular challenges for FL, because drifts arise staggered in time and space (across clients). Our work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solutions to drift adaptation, with their single global model, are ill-suited to staggered drifts, necessitating multiple-model solutions. We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering. Empirical evaluation shows that our solutions achieve significantly higher accuracy than existing baselines, and are comparable to an idealized algorithm with oracle knowledge of the ground-truth clustering of clients to concepts at each time step.more » « less
-
Federated Learning (FL) under distributed concept drift is a largely unexplored area. Although concept drift is itself a well-studied phenomenon, it poses particular challenges for FL, because drifts arise staggered in time and space (across clients). Our work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solutions to drift adaptation, with their single global model, are ill-suited to staggered drifts, necessitating multiple-model solutions. We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering. Empirical evaluation shows that our solutions achieve significantly higher accuracy than existing baselines, and are comparable to an idealized algorithm with oracle knowledge of the ground-truth clustering of clients to concepts at each time step.more » « less
An official website of the United States government

