Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

We show that any memoryconstrained, firstorder algorithm which minimizes ddimensional, 1Lipschitz convex functions over the unit ball to 1/ poly(d) accuracy using at most d^(1.25delta) bits of memory must make at least d^(1+ 4 delta / 3) firstorder queries (for any constant delta in (0,1/4). Consequently, the performance of such memoryconstrained algorithms are a polynomial factor worse than the optimal O(d polylog d) query bound for this problem obtained by cutting plane methods that use >d^2 memory. This resolves one of the open problems in the COLT 2019 open problem publication of Woodworth and Srebro.Free, publiclyaccessible full text available January 1, 2023

We provide new gradientbased methods for efficiently solving a broad class of illconditioned optimization problems. We consider the problem of minimizing a function f : R d > R which is implicitly decomposable as the sum of m unknown noninteracting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the squareroot of the condition numbers of the components. This complexity bound (which we prove is nearly optimal) can improve almost exponentially on that of accelerated gradient methods, which grow as the square root of the condition number of f. Additionally, we provide efficient methods for solving stochastic, quadratic variants of this multiscale optimization problem. Rather than learn the decomposition of f (which would be prohibitively expensive), our methods apply a clean recursive “BigStepLittleStep” interleaving of standard methods. The resulting algorithms use O˜(dm) space, are numerically stable, and open the door to a more finegrained understanding of the complexity of convex optimization beyond condition number.Free, publiclyaccessible full text available January 1, 2023

Understanding cellular stress response pathways is challenging because of the complexity of regulatory mechanisms and response dynamics, which can vary with both time and the type of stress. We developed a reverse genetic method called ReporterSeq to comprehensively identify genes regulating a stressinduced transcription factor under multiple conditions in a timeresolved manner. ReporterSeq links RNAencoded barcode levels to pathwayspecific output under genetic perturbations, allowing pooled pathway activity measurements via DNA sequencing alone and without cell enrichment or singlecell isolation. We used ReporterSeq to identify regulators of the heat shock response (HSR), a conserved, poorly understood transcriptional program that protects cells from proteotoxicity and is misregulated in disease. Genomewide HSR regulation in budding yeast was assessed across 15 stress conditions, uncovering novel stressspecific, timespecific, and constitutive regulators. ReporterSeq can assess the genetic regulators of any transcriptional pathway with the scale of pooled genetic screens and the precision of pathwayspecific readouts.

Selftraining is a standard approach to semisupervised learning where the learner's own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR10, CIFAR100, and SVHN datasets in comparison with FixMatch, a stateoftheart selftraining algorithm. Our code is publicly available from github.

The famous “laurel/yanny” phenomenon references an audio clip that elicits dramatically different responses from different listeners. For the original clip, roughly half the population hears the word “laurel,” while the other half hears “yanny.” How common are such ``polyperceivable'' audio clips? In this paper we apply ML techniques to study the prevalence of polyperceivability in spoken language. We devise a metric that correlates with polyperceivability of audio clips, use it to efficiently find new “laurel/yanny”type examples, and validate these results with human experiments. Our results suggest that polyperceivable examples are surprisingly prevalent, existing for >2% of English words.

We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model. We show that for a broad class of loss functions and parametric families of distributions, the regret of playing a ``proper'' predictorone from the putative model classrelative to the best predictor in the same model class has lower bound scaling at least as sqrt(c n), where c is a measure of the model misspecification to the true distribution in terms of total variation distance. In contrast, using an aggregationbased (improper) learner, one can obtain regret d log n for any underlying generating distribution, where d is the dimension of the parameter; we exhibit instances in which this is unimprovable even over the family of all learners that may play distributions in the convex hull of the parametric family. These results suggest that simple strategies for aggregating multiple learners together should be more robust, and several experiments conform to this hypothesis.

We consider an online binary prediction setting where a forecaster observes a sequence of T bits one by one. Before each bit is revealed, the forecaster predicts the probability that the bit is 1. The forecaster is called wellcalibrated if for each p in [0,1], among the n_p bits for which the forecaster predicts probability p, the actual number of ones, m_p, is indeed equal to p*n_p. The calibration error, defined as \sum_p m_p  p n_p, quantifies the extent to which the forecaster deviates from being wellcalibrated. It has long been known that an O(T^(2/3)) calibration error is achievable even when the bits are chosen adversarially, and possibly based on the previous predictions. However, little is known on the lower bound side, except an sqrt(T) bound that follows from the trivial example of independent fair coin flips. In this paper, we prove an T^(0.528) bound on the calibration error, which is the first bound above the trivial sqrt(T) lowerbound for this setting. The technical contributions of our work include two lower bound techniques, early stopping and sidestepping, which circumvent the obstacles that have previously hindered strong calibration lower bounds. We also propose an abstraction of the prediction setting, termedmore »

We introduce a model for ant trail formation, building upon previous work on biologically feasible local algorithms that plausibly describe how ants maintain trail networks. The model is a variant of a reinforced random walk on a directed graph, where ants lay pheromone on edges as they traverse them and the next edge to traverse is chosen based on the level of pheromone; this pheromone decays with time. There is a bidirectional flow of ants in the network: the forward flow proceeds along forward edges from source (e.g. the nest) to sink (e.g. a food source), and the backward flow in the opposite direction. Some fraction of ants are lost as they pass through each node (modeling the loss of ants due to exploration observed in the field). We initiate a theoretical study of this model. We note that ant navigation has inspired the field of ant colony optimization, heuristics that have been applied to several combinatorial optimization problems; however the algorithms developed there are considerably more complex and not constrained to being biologically feasible. We first consider the linear decision rule, where the flow divides itself among the next set of edges in proportion to their pheromone level. Here,more »

We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements [n]={1,...,n} with corresponding data values x1,...,xn. We observe the values for a "sample" set A \subset [n] and wish to estimate some statistic of the values for a "target" set B \subset [n] where B could be the entire set. Crucially, we assume that the sets A and B are drawn according to some known distribution P over pairs of subsets of [n]. A given estimation algorithm is evaluated based on its "worstcase, expected error" where the expectation is with respect to the distribution P from which the sample A and target sets B are drawn, and the worstcase is with respect to the data values x1,...,xn. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values–where the weights are functions of the distribution P and the sample and target sets A, Band show that the worstcase expected error achieved by this algorithm is at most a multiplicative pi/2 factor worse than the optimal of such algorithms.more »