Search for: All records

Creators/Authors contains: "Valiant, Paul"

« Prev Next »

Total Resources

8

Resource Type
Conference Paper

8

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

8

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Finite-Sample Maximum Likelihood Estimation of Location

Gupta, Shivam ; Lee, Jasper ; Price, Eric ; Valiant, Paul ( November 2022 , Advances in Neural Information Processing Systems)
S. Koyejo ; S. Mohamed ; A. Agarwal ; D. Belgrave ; K. Cho ; A. Oh (Ed.)
Optimal Sub-Gaussian Mean Estimation in $\mathbb{R}$

https://doi.org/10.1109/FOCS52979.2021.00071

Lee, Jasper C.H. ; Valiant, Paul ( February 2022 , IEEE)
Optimal Sub-Gaussian Mean Estimation in Very High Dimensions

Lee, Jasper C.H. ; Valiant, Paul ( January 2022 , 13th Innovations in Theoretical Computer Science Conference (ITCS 2022))
Braverman, Mark (Ed.)
Uncertainty about Uncertainty: Optimal Adaptive Algorithms for Estimating Mixtures of Unknown Coins

Lee, Jasper ; Valiant, Paul ( January 2021 , ACM-SIAM Symposium on Discrete Algorithms (SODA21))
null (Ed.)
Full Text Available
Optimal adaptive algorithms for estimating mixtures of unknown coins.

Lee, Jasper CH ; Valiant, Paul ( January 2021 , Proceedings of the 2021 ACM- SIAM Symposium on Discrete Algorithms (SODA), pages 414–433.)
null (Ed.)
Given a mixture between two populations of coins, “positive” coins that each have unknown and potentially different—bias ≥ 1 + ∆ and “negative” coins with bias ≤ 2 − ∆, we consider the task of estimating the fraction ρ of positive coins to within additive error E. We achieve an upper and lower bound of Θ( ρ log 1 ) samples for a 1 −δ probability of success, where crucially, our lower bound applies to all fully-adaptive algorithms. Thus, our sample complexity bounds have tight dependence for every relevant problem parameter. A crucial component of our lower bound proof is a decomposition lemma (Lemma 5.2) showing how to assemble partially-adaptive bounds into a fully-adaptive bound, which may be of independent interest: though we invoke it for the special case of Bernoulli random variables (coins), it applies to general distributions. We present sim- ulation results to demonstrate the practical efficacy of our approach for realistic problem parameters for crowdsourcing applications, focusing on the “rare events” regime where ρ is small. The fine-grained adaptive flavor of both our algo- rithm and lower bound contrasts with much previous workin distributional testing and learning.
more » « less
Full Text Available
Worst-Case Analysis for Randomly Collected Data

Chen, Justin ; Valiant, Gregory ; Valiant, Paul ( January 2021 , 34th Conference on Neural Information Processing Systems (NeurIPS 2020))
null (Ed.)
Full Text Available
Worst-Case Analysis for Randomly Collected Data

Chen, Justin ; Valiant, Gregory ; Valiant, Paul ( January 2021 , NeurIPS 2021)
null (Ed.)
We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements [n]={1,...,n} with corresponding data values x1,...,xn. We observe the values for a "sample" set A \subset [n] and wish to estimate some statistic of the values for a "target" set B \subset [n] where B could be the entire set. Crucially, we assume that the sets A and B are drawn according to some known distribution P over pairs of subsets of [n]. A given estimation algorithm is evaluated based on its "worst-case, expected error" where the expectation is with respect to the distribution P from which the sample A and target sets B are drawn, and the worst-case is with respect to the data values x1,...,xn. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values–-where the weights are functions of the distribution P and the sample and target sets A, B--and show that the worst-case expected error achieved by this algorithm is at most a multiplicative pi/2 factor worse than the optimal of such algorithms. The algorithm and proof leverage a surprising connection to the Grothendieck problem. We also extend these results to the linear regression setting where each datapoint is not a scalar but a labeled vector (xi,yi). This framework, which makes no distributional assumptions on the data values but rather relies on knowledge of the data collection process via the distribution P, is a significant departure from the typical statistical estimation framework and introduces a uniform analysis for the many natural settings where membership in a sample may be correlated with data values, such as when individuals are recruited into a sample through their social networks as in "snowball/chain" sampling or when samples have chronological structure as in "selective prediction".
more » « less
Full Text Available
Implicit regularization for deep neural networks driven by an Orstein-Uhlenbeck like process

Blanc, Guy ; Gupta, Neha ; Valiant, Gregory ; Valiant, Paul ( January 2020 , 33rd Annual Conference on Learning Theory (COLT))

We consider networks, trained via stochastic gradient descent to minimize L2 loss, with the training labels perturbed by independent noise at each iteration. We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the datapoints, of the squared L2 of the gradient of the model with respect to the parameter vector, evaluated at each data point. This holds for networks of any connectivity, width,depth, and choice of activation function. We interpret this implicit regularization term for three simple settings: matrix sensing, two layer ReLU networks trained on one-dimensional data, and two layer networks with sigmoid activations trained on a single datapoint. For these settings, we show why this new and general implicit regularization effect drives the networks towards "simple" models.
more » « less
Full Text Available