skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the rank of a random binary matrix
We study the rank of a random n × m matrix An, m; k with entries from GF(2), and exactly k unit entries in each column, the other entries being zero. The columns are chosen independently and uniformly at random from the set of all (nk) such columns. We obtain an asymptotically correct estimate for the rank as a function of the number of columns m in terms of c, n, k, and where m = cn/k. The matrix An, m; k forms the vertex-edge incidence matrix of a k-uniform random hypergraph H. The rank of An, m; k can be expressed as follows. Let |C2| be the number of vertices of the 2-core of H, and | E (C2)| the number of edges. Let m* be the value of m for which |C2| = |E(C2)|. Then w.h.p. for m < m* the rank of An, m; k is asymptotic to m, and for m ≥ m* the rank is asymptotic to m – |E(C2)| + |C2|. In addition, assign i.i.d. U[0, 1] weights Xi, i ∊ 1, 2, … m to the columns, and define the weight of a set of columns S as X(S) = ∑j∊S Xj. Define a basis as a set of n – 1 (k even) linearly independent columns. We obtain an asymptotically correct estimate for the minimum weight basis. This generalises the well-known result of Frieze [On the value of a random minimum spanning tree problem, Discrete Applied Mathematics, (1985)] that, for k = 2, the expected length of a minimum weight spanning tree tends to ζ(3) ∼ 1.202.  more » « less
Award ID(s):
1700365
PAR ID:
10106979
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the annual ACM-SIAM Symposium on Discrete Algorithms
ISSN:
1071-9040
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abernethy, Jacob; Agarwal, Agarwal (Ed.)
    Motivated by problems in controlled experiments, we study the discrepancy of random matrices with continuous entries where the number of columns $$n$$ is much larger than the number of rows $$m$$. Our first result shows that if $$\omega(1) = m = o(n)$$, a matrix with i.i.d. standard Gaussian entries has discrepancy $$\Theta(\sqrt{n} \, 2^{-n/m})$$ with high probability. This provides sharp guarantees for Gaussian discrepancy in a regime that had not been considered before in the existing literature. Our results also apply to a more general family of random matrices with continuous i.i.d. entries, assuming that $$m = O(n/\log{n})$$. The proof is non-constructive and is an application of the second moment method. Our second result is algorithmic and applies to random matrices whose entries are i.i.d. and have a Lipschitz density. We present a randomized polynomial-time algorithm that achieves discrepancy $$e^{-\Omega(\log^2(n)/m)}$$ with high probability, provided that $$m = O(\sqrt{\log{n}})$$. In the one-dimensional case, this matches the best known algorithmic guarantees due to Karmarkar–Karp. For higher dimensions $$2 \leq m = O(\sqrt{\log{n}})$$, this establishes the first efficient algorithm achieving discrepancy smaller than $$O( \sqrt{m} )$$. 
    more » « less
  2. This paper expands the analysis of randomized low-rank approximation beyond the Gaussian distribution to four classes of random matrices: (1) independent sub-Gaussian entries, (2) independent sub-Gaussian columns, (3) independent bounded columns, and (4) independent columns with bounded second moment. Using a novel interpretation of the low-rank approximation error involving sample covariance matrices, we provide insight into the requirements of a good random matrix for randomized low-rank approximations. Although our bounds involve unspecified absolute constants (a consequence of underlying nonasymptotic theory of random matrices), they allow for qualitative comparisons across distributions. The analysis offers some details on the minimal number of samples (the number of columns of the random matrix ) and the error in the resulting low-rank approximation. We illustrate our analysis in the context of the randomized subspace iteration method as a representative algorithm for low-rank approximation; however, all the results are broadly applicable to other low-rank approximation techniques. We conclude our discussion with numerical examples using both synthetic and real-world test matrices. 
    more » « less
  3. The classical Beardwood-Halton-Hammersly theorem (1959) asserts the existence of an asymptotic formula of the form $$\beta \sqrt n$$ for the minimum length of a Traveling Salesperson Tour throuh $$n$$ random points in the unit square, and in the decades since it was proved, the existence of such formulas has been shown for other such \emph{Euclidean functionals} on random points in the unit square as well. Despite more than 50 years of attention, however, it remained unknown whether the minimum length TSP through $$n$$ random points in $[0,1]^2$ was asymptotically distinct from its natural lower bounds, such as the minimum length spanning tree, the minimum length 2-factor, or, as raised by Goemans and Bertsimas, from its linear programming relaxation. We prove that the TSP on random points in Euclidean space is indeed asymptotically distinct from these and other natural lower bounds, and show that this separation implies that branch-and-bound algorithms based on these natural lower bounds must take nearly exponential ($$e^{\tilde \Omega(n)}$$) time to solve the TSP to optimality, even in average case. This is the first average-case superpolynomial lower bound for these branch-and-bound algorithms (a lower bound as strong as $$e^{\tilde \Omega (n)}$$ was not even been known in worst-case analysis). 
    more » « less
  4. Let $$N=\binom{n}{2}$$ and $$s\geq 2$$. Let $$e_{i,j},\,i=1,2,\ldots,N,\,j=1,2,\ldots,s$$ be $$s$$ independent permutations of the edges $$E(K_n)$$ of the complete graph $$K_n$$. A MultiTree is a set $$I\subseteq [N]$$ such that the edge sets $$E_{I,j}$$ induce spanning trees for $$j=1,2,\ldots,s$$. In this paper we study the following question: what is the smallest $m=m(n)$ such that w.h.p. $[m]$ contains a MultiTree. We prove a hitting time result for $s=2$ and an $$O(n\log n)$$ bound for $$s\geq 3$$. 
    more » « less
  5. Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)
    In this paper we study a few proximity problems related to a set of pairwise-disjoint segments in {ℝ}². Let S be a set of n pairwise-disjoint segments in {ℝ}², and let r > 0 be a parameter. We define the segment proximity graph of S to be G_r(S) := (S,E), where E = {(e₁,e₂) ∣ dist(e₁,e₂) ≤ r} and dist (e₁,e₂) = min_{(p,q) ∈ e₁× e₂} ‖p-q‖ is the Euclidean distance between e₁ and e₂. We define the weight of an edge (e₁,e₂) ∈ E to be dist(e₁,e₂). We first present a simple grid-based O(nlog² n)-time algorithm for computing a BFS tree of G_r(S). We apply it to obtain an O^*(n^{6/5}) + O(nlog²nlogΔ)-time algorithm for the so-called reverse shortest path problem, in which we want to find the smallest value r^* for which G_{r^*}(S) contains a path of some specified length between two designated start and target segments (where the O^*(⋅) notation hides polylogarithmic factors). Here Δ = max_{e ≠ e' ∈ S} dist(e,e')/min_{e ≠ e' ∈ S} dist(e,e') is the spread of S. Next, we present a dynamic data structure that can maintain a set S of pairwise-disjoint segments in the plane under insertions/deletions, so that, for a query segment e from an unknown set Q of pairwise-disjoint segments, such that e does not intersect any segment in (the current version of) S, the segment of S closest to e can be computed in O(log⁵ n) amortized time. The amortized update time is also O(log⁵ n). We note that if the segments in S∪Q are allowed to intersect then the known lower bounds on halfplane range searching suggest that a sequence of n updates and queries may take at least close to Ω(n^{4/3}) time. One thus has to strongly rely on the non-intersecting property of S and Q to perform updates and queries in O(polylog(n)) (amortized) time each. Using these results on nearest-neighbor (NN) searching for disjoint segments, we show that a DFS tree (or forest) of G_r(S) can be computed in O^*(n) time. We also obtain an O^*(n)-time algorithm for constructing a minimum spanning tree of G_r(S). Finally, we present an O^*(n^{4/3})-time algorithm for computing a single-source shortest-path tree in G_r(S). This is the only result that does not exploit the disjointness of the input segments. 
    more » « less