NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Survey on algorithms for multi-index models

Bruna, Joan; Hsu, Daniel (September 2025, Statistical science)

Free, publicly-accessible full text available September 18, 2026
The Piranha Problem: Large Effects Swimming in a Small Pond

https://doi.org/10.1090/noti3044

Tosh, Christopher; Greengard, Philip; Goodrich, Ben; Gelman, Andrew; Vehtari, Aki; Hsu, Daniel (January 2025, Notices of the American Mathematical Society)

Full Text Available
On the sample complexity of parameter estimation in logistic regression with normal design

Hsu, Daniel; Mazumdar, Arya (June 2024, PMLR)

Full Text Available
On the sample complexity of parameter estimation in logistic regression with normal design

Hsu, Daniel; Mazumdar, Arya (June 2024, PMLR)

Full Text Available
Distribution-Specific Auditing for Subgroup Fairness

https://doi.org/10.4230/LIPIcs.FORC.2024.5

Hsu, Daniel; Huang, Jizhou; Juba, Brendan (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Rothblum, Guy N (Ed.)
We study the problem of auditing classifiers for statistical subgroup fairness. Kearns et al. [Kearns et al., 2018] showed that the problem of auditing combinatorial subgroups fairness is as hard as agnostic learning. Essentially all work on remedying statistical measures of discrimination against subgroups assumes access to an oracle for this problem, despite the fact that no efficient algorithms are known for it. If we assume the data distribution is Gaussian, or even merely log-concave, then a recent line of work has discovered efficient agnostic learning algorithms for halfspaces. Unfortunately, the reduction of Kearns et al. was formulated in terms of weak, "distribution-free" learning, and thus did not establish a connection for families such as log-concave distributions. In this work, we give positive and negative results on auditing for Gaussian distributions: On the positive side, we present an alternative approach to leverage these advances in agnostic learning and thereby obtain the first polynomial-time approximation scheme (PTAS) for auditing nontrivial combinatorial subgroup fairness: we show how to audit statistical notions of fairness over homogeneous halfspace subgroups when the features are Gaussian. On the negative side, we find that under cryptographic assumptions, no polynomial-time algorithm can guarantee any nontrivial auditing, even under Gaussian feature distributions, for general halfspace subgroups.
more » « less
Full Text Available
Simple and near-optimal algorithms for hidden stratification and multi-group learning

Tosh, Christopher; Hsu, Daniel (January 2022, Thirty-Ninth International Conference on Machine Learning)

Full Text Available
Unbiased estimators for random design regression

Derezinski, Michal; Warmuth, Manfred; Hsu, Daniel (January 2022, Journal of machine learning research)

Full Text Available
Learning tensor representations for meta-learning

Deng, Samuel; Guo, Yilin; Hsu, Daniel; Mandal, Debmalya (January 2022, Twenty-Fifth International Conference on Artificial Intelligence and Statistics)

Full Text Available
Masked Prediction: A Parameter Identifiability View

Liu, Bingbin; Hsu, Daniel J.; Ravikumar, Pradeep; Risteski, Andrej (January 2022, Advances in neural information processing systems)

The vast majority of work in self-supervised learning have focused on assessing recovered features by a chosen set of downstream tasks. While there are several commonly used benchmark datasets, this lens of feature learning requires assumptions on the downstream tasks which are not inherent to the data distribution itself. In this paper, we present an alternative lens, one of parameter identifiability: assuming data comes from a parametric probabilistic model, we train a self-supervised learning predictor with a suitable parametric form, and ask whether the parameters of the optimal predictor can be used to extract the parameters of the ground truth generative model. Specifically, we focus on latent-variable models capturing sequential structures, namely Hidden Markov Models with both discrete and conditionally Gaussian observations. We focus on masked prediction as the self-supervised learning task and study the optimal masked predictor. We show that parameter identifiability is governed by the task difficulty, which is determined by the choice of data model and the amount of tokens to predict. Technique-wise, we uncover close connections with the uniqueness of tensor rank decompositions, a widely used tool in studying identifiability through the lens of the method of moments.
more » « less
Full Text Available
Near-optimal statistical query lower bounds for agnostically learning intersections of halfspaces with {Gaussian} marginals

Hsu, Daniel; Sanford, Clayton; Servedio, Rocco; Vlatakis-Gkaragkounis, Emmanouil-Vasileios (January 2022, Thirty-Fifth Annual Conference on Learning Theory)

Full Text Available

« Prev Next »

Search for: All records