skip to main content


This content will become publicly available on January 1, 2025

Title: Improved estimates for the number of non-negative integer matrices with given row and column sums
The number of non-negative integer matrices with given row and column sums features in a variety of problems in mathematics and statistics but no closed-form expression for it is known, so we rely on approximations. In this paper, we describe a new such approximation, motivated by consideration of the statistics of matrices with non-integer numbers of columns. This estimate can be evaluated in time linear in the size of the matrix and returns results of accuracy as good as or better than existing linear-time approximations across a wide range of settings. We show that the estimate is asymptotically exact in the regime of sparse tables, while empirically performing at least as well as other linear-time estimates in the regime of dense tables. We also use the new estimate as the starting point for an improved numerical method for either counting or sampling matrices with given margins using sequential importance sampling. Code implementing our methods is available.  more » « less
Award ID(s):
2005899
PAR ID:
10537883
Author(s) / Creator(s):
; ;
Publisher / Repository:
The Royal Society
Date Published:
Journal Name:
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Volume:
480
Issue:
2282
ISSN:
1364-5021
Page Range / eLocation ID:
20230470
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. While mixed integer linear programming (MILP) solvers are routinely used to solve a wide range of important science and engineering problems, it remains a challenging task for end users to write correct and efficient MILP constraints, especially for problems specified using the inherently non-linear Boolean logic operations. To overcome this challenge, we propose a syntax guided synthesis (SyGuS) method capable of generating high-quality MILP constraints from the specifications expressed using arbitrary combinations of Boolean logic operations. At the center of our method is an extensible domain specification language (DSL) whose expressiveness may be improved by adding new integer variables as decision variables, together with an iterative procedure for synthesizing linear constraints from non-linear Boolean logic operations using these integer variables. To make the synthesis method efficient, we also propose an over-approximation technique for soundly proving the correctness of the synthesized linear constraints, and an under-approximation technique for safely pruning away the incorrect constraints. We have implemented and evaluated the method on a wide range of benchmark specifications from statistics, machine learning, and data science applications. The experimental results show that the method is efficient in handling these benchmarks, and the quality of the synthesized MILP constraints is close to, or higher than, that of manually-written constraints in terms of both compactness and solving time.

     
    more » « less
  2. ABSTRACT

    We present configuration-space estimators for the auto- and cross-covariance of two- and three-point correlation functions (2PCF and 3PCF) in general survey geometries. These are derived in the Gaussian limit (setting higher order correlation functions to zero), but for arbitrary non-linear 2PCFs (which may be estimated from the survey itself), with a shot-noise rescaling parameter included to capture non-Gaussianity. We generalize previous approaches to include Legendre moments via a geometry-correction function calibrated from measured pair and triple counts. Making use of importance sampling and random particle catalogues, we can estimate model covariances in fractions of the time required to do so with mocks, obtaining estimates with negligible sampling noise in ∼10 (∼100) CPU-hours for the 2PCF (3PCF) autocovariance. We compare results to sample covariances from a suite of BOSS DR12 mocks and find the matrices to be in good agreement, assuming a shot-noise rescaling parameter of 1.03 (1.20) for the 2PCF (3PCF). To obtain strongest constraints on cosmological parameters, we must use multiple statistics in concert; having robust methods to measure their covariances at low computational cost is thus of great relevance to upcoming surveys.

     
    more » « less
  3. Let I = f1,..., fm ⊂ Q[x1,..., xn] be a zero dimensional radical ideal defined by polynomials given with exact rational coefficients. Assume that we are given approximations {z1,..., zk} ⊂ Cn for the common roots {ξ1,..., ξk} = V (I) ⊆ Cn. In this paper we show how to construct and certify the rational entries of Hermite matrices for I from the approximate roots {z1,..., zk}. When I is non-radical, we give methods to construct and certify Hermite matrices for √ I from the approximate roots. Furthermore, we use signatures of these Hermite matrices to give rational certificates of non-negativity of a given polynomial over a (possibly positive dimensional) real variety, as well as certificates that there is a real root within an ε distance from a given point z ∈ Qn. 
    more » « less
  4. Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency – given n input points, most kernel-based algorithms need to materialize the full n × n kernel matrix before performing any subsequent computation, thus incurring Ω(n^2) runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain subquadratic time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving lin- ear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recently developed Kernel Density Estimation framework, which (after preprocessing in time subquadratic in n) can return estimates of row/column sums of the kernel matrix. In particular, we de- velop efficient reductions from weighted vertex and weighted edge sampling on kernel graphs, simulating random walks on kernel graphs, and importance sampling on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in sublinear (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsi- fication, where we observe a 9x decrease in the number of kernel evaluations over baselines for LRA and a 41x reduction in the graph size for spectral sparsification. 
    more » « less
  5. Bienstock, D. (Ed.)
    Motivated by problems in optimization we study the sparsity of the solutions to systems of linear Diophantine equations and linear integer programs, i.e., the number of non-zero entries of a solution, which is often referred to as the ℓ0-norm. Our main results are improved bounds on the ℓ0-norm of sparse solutions to systems 𝐴𝑥=𝑏, where 𝐴∈ℤ𝑚×𝑛, 𝑏∈ℤ𝑚 and 𝑥 is either a general integer vector (lattice case) or a non-negative integer vector (semigroup case). In the lattice case and certain scenarios of the semigroup case, we give polynomial time algorithms for computing solutions with ℓ0-norm satisfying the obtained bounds. 
    more » « less