Let L be a language that can be decided in linear space and let ϵ>0 be any constant. Let A be the exponential hardness assumption that for every n, membership in L for inputs of length n cannot be decided by circuits of size smaller than 2ϵn. We prove that for every function f:{0,1}∗→{0,1}, computable by a randomized logspace algorithm R, there exists a deterministic logspace algorithm D (attempting to compute f), such that on every input x of length n, the algorithm D outputs one of the following:1)The correct value f(x).2)The string: “I am unable to compute f(x) because the hardness assumption A is false”, followed by a (provenly correct) circuit of size smaller than 2ϵn′ for membership in L for inputs of length n′, for some n′=Θ(logn); that is, a circuit that refutes A. Moreover, D is explicitly constructed, given R.We note that previous works on the hardness-versus-randomness paradigm give derandomized algorithms that rely blindly on the hardness assumption. If the hardness assumption is false, the algorithms may output incorrect values, and thus a user cannot trust that an output given by the algorithm is correct. Instead, our algorithm D verifies the computation so that it never outputs an incorrect value. Thus, if D outputs a value for f(x), that value is certified to be correct. Moreover, if D does not output a value for f(x), it alerts that the hardness assumption was found to be false, and refutes the assumption.Our next result is a universal derandomizer for BPL (the class of problems solvable by bounded-error randomized logspace algorithms) 1 : We give a deterministic algorithm U that takes as an input a randomized logspace algorithm R and an input x and simulates the computation of R on x, deteriministically. Under the widely believed assumption BPL=L, the space ...
more »
« less
Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group
In the analysis of data sets consisting of (X, Y)-pairs, a tacit assumption is that each pair corresponds to the same observational unit. If, however, such pairs are obtained via record linkage of two files, this assumption can be violated as a result of mismatch error rooting, for example, in the lack of reliable identifiers in the two files. Recently, there has been a surge of interest in this setting under the term “Shuffled Data” in which the underlying correct pairing of (X, Y)-pairs is represented via an unknown permutation. Explicit modeling of the permutation tends to be associated with overfitting, prompting the need for suitable methods of regularization. In this paper, we propose an exponential family prior on the permutation group for this purpose that can be used to integrate various structures such as sparse and local shuffling. This prior turns out to be conjugate for canonical shuffled data problems in which the likelihood conditional on a fixed permutation can be expressed as product over the corresponding (X,Y)-pairs. Inference can be based on the EM algorithm in which the E-step is approximated by sampling, e.g., via the Fisher-Yates algorithm. The M-step is shown to admit a reduction from n^2 to n terms if the likelihood of (X,Y)-pairs has exponential family form. Comparisons on synthetic and real data show that the proposed approach compares favorably to competing methods.
more »
« less
- Award ID(s):
- 2120318
- PAR ID:
- 10446725
- Editor(s):
- Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Volume:
- 206
- ISSN:
- 2640-3498
- Page Range / eLocation ID:
- 2939-2959
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A challenge to understanding biological diversification is accounting for community-scale processes that cause multiple, co-distributed lineages to co-speciate. Such processes predict non-independent, temporally clustered divergences across taxa. Approximate-likelihood Bayesian computation (ABC) approaches to inferring such patterns from comparative genetic data are very sensitive to prior assumptions and often biased toward estimating shared divergences. We introduce a full-likelihood Bayesian approach, ecoevolity, which takes full advantage of information in genomic data. By analytically integrating over gene trees, we are able to directly calculate the likelihood of the population history from genomic data, and efficiently sample the model-averaged posterior via Markov chain Monte Carlo algorithms. Using simulations, we find that the new method is much more accurate and precise at estimating the number and timing of divergence events across pairs of populations than existing approximate-likelihood methods. Our full Bayesian approach also requires several orders of magnitude less computational time than existing ABC approaches. We find that despite assuming unlinked characters (e.g., unlinked single-nucleotide polymorphisms), the new method performs better if this assumption is violated in order to retain the constant characters of whole linked loci. In fact, retaining constant characters allows the new method to robustly estimate the correct number of divergence events with high posterior probability in the face of character-acquisition biases, which commonly plague loci assembled from reduced-representation genomic libraries. We apply our method to genomic data from four pairs of insular populations of Gekko lizards from the Philippines that are not expected to have co-diverged. Despite all four pairs diverging very recently, our method strongly supports that they diverged independently, and these results are robust to very disparate prior assumptions.more » « less
-
Aggarwal, Divesh (Ed.)We study the problem of function inversion with preprocessing where, given a function f : [N] → [N] and a point y in its image, the goal is to find an x such that f(x) = y using at most T oracle queries to f and S bits of preprocessed advice that depend on f. The seminal work of Corrigan-Gibbs and Kogan [TCC 2019] initiated a line of research that shows many exciting connections between the non-adaptive setting of this problem and other areas of theoretical computer science. Specifically, they introduced a very weak class of algorithms (strongly non-adaptive) where the points queried by the oracle depend only on the inversion point y, and are independent of the answers to the previous queries and the S bits of advice. They showed that proving even mild lower bounds on strongly non-adaptive algorithms for function inversion would imply a breakthrough result in circuit complexity. We prove that every strongly non-adaptive algorithm for function inversion (and even for its special case of permutation inversion) must have ST = Ω(N log (N) log (T)). This gives the first improvement to the long-standing lower bound of ST = Ω(N log N) due to Yao [STOC 90]. As a corollary, we conclude the first separation between strongly non-adaptive and adaptive algorithms for permutation inversion, where the adaptive algorithm by Hellman [TOIT 80] achieves the trade-off ST = O(N log N). Additionally, we show equivalence between lower bounds for strongly non-adaptive data structures and the one-way communication complexity of certain partial functions. As an example, we recover our lower bound on function inversion in the communication complexity framework.more » « less
-
ABSTRACT Our Universe is homogeneous and isotropic, and its perturbations obey translation and rotation symmetry. In this work, we develop translation and rotation equivariant normalizing flow (TRENF), a generative normalizing flow (NF) model which explicitly incorporates these symmetries, defining the data likelihood via a sequence of Fourier space-based convolutions and pixel-wise non-linear transforms. TRENF gives direct access to the high dimensional data likelihood p(x|y) as a function of the labels y, such as cosmological parameters. In contrast to traditional analyses based on summary statistics, the NF approach has no loss of information since it preserves the full dimensionality of the data. On Gaussian random fields, the TRENF likelihood agrees well with the analytical expression and saturates the Fisher information content in the labels y. On non-linear cosmological overdensity fields from N-body simulations, TRENF leads to significant improvements in constraining power over the standard power spectrum summary statistic. TRENF is also a generative model of the data, and we show that TRENF samples agree well with the N-body simulations it trained on, and that the inverse mapping of the data agrees well with a Gaussian white noise both visually and on various summary statistics: when this is perfectly achieved the resulting p(x|y) likelihood analysis becomes optimal. Finally, we develop a generalization of this model that can handle effects that break the symmetry of the data, such as the survey mask, which enables likelihood analysis on data without periodic boundaries.more » « less
-
Bramas, Quentin; Casteigts, Arnaud; Meeks, Kitty (Ed.)Selective families of sets, or selectors, are combinatorial tools used to “isolate” individual members of sets from some set family. Given a set X and an element x ∈ X, to isolate x from X, at least one of the sets in the selector must intersect X on exactly x. We study (k,N)-permutation selectors which have the property that they can isolate each element of each k-element subset of {0, 1, ..., N − 1} in each possible order. These selectors can be used in protocols for ad-hoc radio networks to more efficiently disseminate informa- tion along multiple hops. In 2004, Gasieniec, Radzik and Xin gave a construc- tion of a (k, N )-permutation selector of size O(k2 log3 N ). This paper improves this by providing a probabilistic construction of a (k, N )-permutation selector of size O(k2 log N ). Remarkably, this matches the asymptotic bound for standard strong (k,N)-selectors, that isolate each element of each set of size k, but with no restriction on the order. We then show that the use of our (k, N )-permutation selector improves the best running time for gossiping in ad-hoc radio networks by a poly-logarithmic factor.more » « less
An official website of the United States government

