skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Almost Uniform Sampling From Neural Networks
Given a length n sample from R^d and a neural network with a fixed architecture with W weights, k neurons, linear threshold activation functions, and binary outputs on each neuron, we study the problem of uniformly sampling from all possible labelings on the sample corresponding to different choices of weights. We provide an algorithm that runs in time polynomial both in n and W such that any labeling appears with probability at least (W2ekn)^W for W < n. For a single neuron, we also provide a random walk based algorithm that samples exactly uniformly.  more » « less
Award ID(s):
1619452
PAR ID:
10571642
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
ISBN:
978-1-7281-4085-8
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Location:
Princeton, NJ, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a weighted approach to compute a maximum cardinality matching in an arbitrary bipartite graph. Our main result is a new algorithm that takes as input a weighted bipartite graph G(A cup B,E) with edge weights of 0 or 1. Let w <= n be an upper bound on the weight of any matching in G. Consider the subgraph induced by all the edges of G with a weight 0. Suppose every connected component in this subgraph has O(r) vertices and O(mr/n) edges. We present an algorithm to compute a maximum cardinality matching in G in O~(m(sqrt{w} + sqrt{r} + wr/n)) time. When all the edge weights are 1 (symmetrically when all weights are 0), our algorithm will be identical to the well-known Hopcroft-Karp (HK) algorithm, which runs in O(m sqrt{n}) time. However, if we can carefully assign weights of 0 and 1 on its edges such that both w and r are sub-linear in n and wr=O(n^{gamma}) for gamma < 3/2, then we can compute maximum cardinality matching in G in o(m sqrt{n}) time. Using our algorithm, we obtain a new O~(n^{4/3}/epsilon^4) time algorithm to compute an epsilon-approximate bottleneck matching of A,B subsetR^2 and an 1/(epsilon^{O(d)}}n^{1+(d-1)/(2d-1)}) poly log n time algorithm for computing epsilon-approximate bottleneck matching in d-dimensions. All previous algorithms take Omega(n^{3/2}) time. Given any graph G(A cup B,E) that has an easily computable balanced vertex separator for every subgraph G'(V',E') of size |V'|^{delta}, for delta in [1/2,1), we can apply our algorithm to compute a maximum matching in O~(mn^{delta/1+delta}) time improving upon the O(m sqrt{n}) time taken by the HK-Algorithm. 
    more » « less
  2. Morin, P; Suri, S (Ed.)
    We define simple variants of zip trees, called zip-zip trees, which provide several advantages over zip trees, including overcoming a bias that favors smaller keys over larger ones. We analyze zip-zip trees theoretically and empirically, showing, e.g., that the expected depth of a node in an n-node zip-zip tree is at most 1.3863log n -1 + o(1), which matches the expected depth of treaps and binary search trees built by uniformly random insertions. Unlike these other data structures, however, zip-zip trees achieve their bounds using only O(loglog n) bits of metadata per node, w.h.p., as compared to the O(log n) bits per node required by treaps. In fact, we even describe a “just-in-time” zip-zip tree variant, which needs just an expected O(1) number of bits of metadata per node. Moreover, we can define zip-zip trees to be strongly history independent, whereas treaps are generally only weakly history independent. We also introduce biased zip-zip trees, which have an explicit bias based on key weights, so the expected depth of a key, k, with weight, w, is O(log W/w), where W is the weight of all keys in the weighted zip-zip tree. Finally, we show that one can easily make zip-zip trees partially persistent with only O(n) space overhead w.h.p. 
    more » « less
  3. For graphs of average degree $$d$$, positive integer weights bounded by $$W$$, and accuracy parameter $$\epsilon>0$$, [Chazelle, Rubinfeld, Trevisan; SICOMP'05] have shown that the weight of the minimum spanning tree can be $$(1+\epsilon)$$-approximated in $$\tilde{O}(Wd/\epsilon^2)$$ expected time. This algorithm is frequently taught in courses on sublinear time algorithms. However, the $$\tilde{O}(Wd/\epsilon^2)$$-time variant requires an involved analysis, leading to simpler but much slower variations being taught instead. Here we present an alternative that is not only simpler to analyze, but also improves the number of queries, getting closer to the nearly-matching information theoretic lower bound. In addition to estimating the weight of the MST, our algorithm is also a perfect sampler for sampling uniformly at random an edge of the MST. At the core of our result is the insight that halting Prim's algorithm after an expected $$\tilde{O}(d)$$ number of steps, then returning the highest weighted edge of the tree, results in sampling an edge of the MST uniformly at random. Via repeated trials and averaging the results, this immediately implies an algorithm for estimating the weight of the MST. Since our algorithm is based on Prim's, it naturally works for non-integer weighted graphs as well. 
    more » « less
  4. In general, the generator matrix sparsity is a critical factor in determining the encoding complexity of a linear code. Further, certain applications, e.g., distributed crowdsourcing schemes utilizing linear codes, require most or even all the columns of the generator matrix to have some degree of sparsity. In this paper, we leverage polar codes and the well-established channel polarization to design capacity-achieving codes with a certain constraint on the weights of all the columns in the generator matrix (GM) while having a low-complexity decoding algorithm. We first show that given a binary-input memoryless symmetric (BMS) channel $$W$$ and a constant $$s \in (0, 1]$$ , there exists a polarization kernel such that the corresponding polar code is capacity-achieving with the rate of polarization $s/2$ , and the GM column weights being bounded from above by $$N^{s}$$ . To improve the sparsity versus error rate trade-off, we devise a column-splitting algorithm and two coding schemes for BEC and then for general BMS channels. The polar-based codes generated by the two schemes inherit several fundamental properties of polar codes with the original $$2 \times 2$$ kernel including the decay in error probability, decoding complexity, and the capacity-achieving property. Furthermore, they demonstrate the additional property that their GM column weights are bounded from above sublinearly in $$N$$ , while the original polar codes have some column weights that are linear in $$N$$ . In particular, for any BEC and $$\beta < 0.5$$ , the existence of a sequence of capacity-achieving polar-based codes where all the GM column weights are bounded from above by $$N^{\lambda} $$ with $$\lambda \approx 0.585$$ , and with the error probability bounded by $${\mathcal {O}}(2^{-N^{\beta }})$$ under a decoder with complexity $${\mathcal {O}}(N\log N)$$ , is shown. The existence of similar capacity-achieving polar-based codes with the same decoding complexity is shown for any BMS channel and $$\beta < 0.5$$ with $$\lambda \approx 0.631$$ . 
    more » « less
  5. We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2),$$ i.e. the sample is observed only if it lies inside a set $$S$$. The goal is to learn the weights $$w_i$$ and the means $$\mu_i$$. We propose an algorithm that takes only $$\frac{1}{\varepsilon^{O(k)}}$$ samples to estimate the weights $$w_i$$ and the means $$\mu_i$$ within $$\varepsilon$$ error. 
    more » « less