skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Estimating a Directed Tree for Extremes
The Extremal River Problem has emerged as a flagship problem for causal discovery in extreme values of a network. The task is to recover a river network from only extreme flow measured at a set V of stations, without any information on the stations' locations. We present QTree, a new simple and efficient algorithm to solve the Extremal River Problem that performs very well compared to existing methods on hydrology data and in simulations. QTree returns a root-directed tree and achieves almost perfect recovery on the Upper Danube network data, the existing benchmark data set, as well as on new data from the Lower Colorado River network in Texas. It can handle missing data, has an automated parameter tuning procedure, and runs in time O(n |V|^2), where n is the number of observations and |V| the number of nodes in the graph. Furthermore, we prove that the QTree estimator is consistent under a Bayesian network model for extreme values with noise. We also assess the small sample behaviour of QTree through simulations and detail the strengths and possible limitations of QTree.  more » « less
Award ID(s):
2113468
PAR ID:
10466653
Author(s) / Creator(s):
Publisher / Repository:
arxiv
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B Methodological
ISSN:
0035-9246
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We develop a set of highly efficient and effective computational algorithms and simulation tools for fluid simulations on a network. The mathematical models are a set of hyperbolic conservation laws on edges of a network, as well as coupling conditions on junctions of a network. For example, the shallow water system, together with flux balance and continuity conditions at river intersections, model water flows on a river network. The computationally accurate and robust discontinuous Galerkin methods, coupled with explicit strong stability preserving Runge-Kutta methods, are implemented for simulations on network edges. Meanwhile, linear and nonlinear scalable Riemann solvers are being developed and implemented at network vertices. These network simulations result in tools that are added to the existing PETSc and DMNetwork software libraries for the scientific community in general. Simulation results of a shallow water system on a Mississippi river network with over one billion network variables are performed on an extreme-scale computer using up to 8,192 processor with an optimal parallel efficiency. Further potential applications include traffic flow simulations on a highway network and blood flow simulations on a arterial network, among many others. 
    more » « less
  2. Private information retrieval (PIR) allows a user to retrieve a desired message out of K possible messages from N databases without revealing the identity of the desired message. There has been significant recent progress on understanding fundamental information-theoretic limits of PIR, and in particular the download cost of PIR for several variations. Majority of existing works however, assume the presence of replicated databases, each storing all the K messages. In this work, we consider the problem of PIR from storage constrained databases. Each database has a storage capacity of μKL bits, where K is the number of messages, L is the size of each message in bits, and μ ∈ [1/N, 1] is the normalized storage. In the storage constrained PIR problem, there are two key design questions: a) how to store content across each database under storage constraints; and b) construction of schemes that allow efficient PIR through storage constrained databases. The main contribution of this work is a general achievable scheme for PIR from storage constrained databases for any value of storage. In particular, for any (N, K), with normalized storage μ = t/N, where the parameter t can take integer values t ∈ {1, 2, ..., N}, we show that our proposed PIR scheme achieves a download cost of (1 + 1/t + 1/2 + ⋯ + 1/t K-1 ). The extreme case when μ = 1 (i.e., t = N) corresponds to the setting of replicated databases with full storage. For this extremal setting, our scheme recovers the information-theoretically optimal download cost characterized by Sun and Jafar as (1 + 1/N + ⋯ +1/N K-1 ). For the other extreme, when μ = 1/N (i.e., t = 1), the proposed scheme achieves a download cost of K. The most interesting aspect of the result is that for intermediate values of storage, i.e., 1/N <; μ <; 1, the proposed scheme can strictly outperform memory-sharing between extreme values of storage. 
    more » « less
  3. The combinatorial problem in this paper is motivated by a variant of the famous traveling salesman problem where the salesman must return to the starting point after each delivery. The total length of a delivery route is related to a metric known as closeness centrality. The closeness centrality of a vertex v in a graph G was defined in 1950 by Bavelas to be CC(v)=|V(G)|−1SD(v), where SD(v) is the sum of the distances from v to each of the other vertices (which is one-half of the total distance in the delivery route). We provide a real-world example involving the Metro Atlanta Rapid Transit Authority rail network and identify stations whose SD values are nearly identical, meaning they have a similar proximity to other stations in the network. We then consider theoretical aspects involving asymmetric trees. For integer values of k, we considered the asymmetric tree with paths of lengths k,2k,…,nk that are incident to a center vertex. We investigated trees with different values of k, and for k=1 and k=2, we established necessary and sufficient conditions for the existence of two vertices with identical SD values, which has a surprising connection to the triangular numbers. Additionally, we investigated asymmetric trees with paths incident to two vertices and found a sufficient condition for vertices to have equal SD values. This leads to new combinatorial proofs of identities arising from Pascal’s triangle. 
    more » « less
  4. In the k -cut problem, we want to find the lowest-weight set of edges whose deletion breaks a given (multi)graph into k connected components. Algorithms of Karger and Stein can solve this in roughly O ( n 2k ) time. However, lower bounds from conjectures about the k -clique problem imply that Ω ( n (1- o (1)) k ) time is likely needed. Recent results of Gupta, Lee, and Li have given new algorithms for general k -cut in n 1.98k + O(1) time, as well as specialized algorithms with better performance for certain classes of graphs (e.g., for small integer edge weights). In this work, we resolve the problem for general graphs. We show that the Contraction Algorithm of Karger outputs any fixed k -cut of weight α λ k with probability Ω k ( n - α k ), where λ k denotes the minimum k -cut weight. This also gives an extremal bound of O k ( n k ) on the number of minimum k -cuts and an algorithm to compute λ k with roughly n k polylog( n ) runtime. Both are tight up to lower-order factors, with the algorithmic lower bound assuming hardness of max-weight k -clique. The first main ingredient in our result is an extremal bound on the number of cuts of weight less than 2 λ k / k , using the Sunflower lemma. The second ingredient is a fine-grained analysis of how the graph shrinks—and how the average degree evolves—in the Karger process. 
    more » « less
  5. ABSTRACT For $$B \subseteq \mathbb F_q^m$$, the nth affine extremal number of B is the maximum cardinality of a set $$A \subseteq \mathbb F_q^n$$ with no subset, which is affinely isomorphic to B. Furstenberg and Katznelson proved that for any $$B \subseteq \mathbb F_q^m$$, the nth affine extremal number of B is $o(q^n)$ as $$n \to \infty$$. By counting affine homomorphisms between subsets of $$\mathbb F_q^n$$, we derive new bounds and give new proofs of some previously known bounds for certain affine extremal numbers. At the same time, we establish corresponding supersaturation results. We connect these bounds to certain Ramsey-type numbers in vector spaces over finite fields. For $$s,t \geq 1$$, let $$R_q(s,t)$$ denote the minimum n such that in every red–blue coloring of the one-dimensional subspaces of $$\mathbb F_q^n$$, there is either a red s-dimensional subspace or a blue t-dimensional subspace of $$\mathbb F_q^n$$. The existence of these numbers is a special case of a well-known theorem of Graham, Leeb and Rothschild. We improve the best-known upper bounds on $$R_2(2,t)$$, $$R_3(2,t)$$, $$R_2(t,t)$$ and $$R_3(t,t)$$. 
    more » « less