skip to main content


Title: The Erlang weighted tree, a new branching process
Abstract

In this paper, we study a new discrete tree and the resulting branching process, which we call the erlang weighted tree(EWT). The EWT appears as the local weak limit of a random graph model proposed in La and Kabkab, Internet Math. 11 (2015), no. 6, 528–554. In contrast to the local weak limit of well‐known random graph models, the EWT has an interdependent structure. In particular, its vertices encode a multi‐type branching process with uncountably many types. We derive the main properties of the EWT, such as the probability of extinction, growth rate, and so forth. We show that the probability of extinction is the smallest fixed point of an operator. We then take a point process perspective and analyze the growth rate operator. We derive the Krein–Rutman eigenvalue and the corresponding eigenfunctions of the growth operator, and show that the probability of extinction equals one if and only if .

 
more » « less
Award ID(s):
2008130
NSF-PAR ID:
10470536
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Random Structures & Algorithms
ISSN:
1042-9832
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Community detection is considered for a stochastic block model graph of n vertices, with K vertices in the planted community, edge probability p for pairs of vertices both in the community, and edge probability q for other pairs of vertices. The main focus of the paper is on weak recovery of the community based on the graph G , with o ( K ) misclassified vertices on average, in the sublinear regime n 1- o (1) ≤ K ≤ o ( n ). A critical parameter is the effective signal-to-noise ratio λ = K 2 ( p - q ) 2 / (( n - K ) q ), with λ = 1 corresponding to the Kesten–Stigum threshold. We show that a belief propagation (BP) algorithm achieves weak recovery if λ > 1 / e, beyond the Kesten–Stigum threshold by a factor of 1 / e. The BP algorithm only needs to run for log * n + O (1) iterations, with the total time complexity O (| E |log * n ), where log * n is the iterated logarithm of n . Conversely, if λ ≤ 1 / e, no local algorithm can asymptotically outperform trivial random guessing. Furthermore, a linear message-passing algorithm that corresponds to applying a power iteration to the nonbacktracking matrix of the graph is shown to attain weak recovery if and only if λ > 1. In addition, the BP algorithm can be combined with a linear-time voting procedure to achieve the information limit of exact recovery (correctly classify all vertices with high probability) for all K ≥ ( n / log n ) (ρ BP + o (1)), where ρ BP is a function of p / q . 
    more » « less
  2. Abstract

    We consider a process of noncollidingq-exchangeable random walks onZmaking steps 0 (‘straight’) and −1 (‘down’). A single random walk is calledq-exchangeable if under an elementary transposition of the neighboring steps(down,straight)(straight,down)the probability of the trajectory is multiplied by a parameterq(0,1). Our process ofmnoncollidingq-exchangeable random walks is obtained from the independentq-exchangeable walks via the Doob’sh-transform for a nonnegative eigenfunctionh(expressed via theq-Vandermonde product) with the eigenvalue less than 1. The system ofmwalks evolves in the presence of an absorbing wall at 0. The repulsion mechanism is theq-analogue of the Coulomb repulsion of random matrix eigenvalues undergoing Dyson Brownian motion. However, in our model, the particles are confined to the positive half-line and do not spread as Brownian motions or simple random walks. We show that the trajectory of the noncollidingq-exchangeable walks started from an arbitrary initial configuration forms a determinantal point process, and express its kernel in a double contour integral form. This kernel is obtained as a limit from the correlation kernel ofq-distributed random lozenge tilings of sawtooth polygons. In the limit asm,q=eγ/mwithγ > 0 fixed, and under a suitable scaling of the initial data, we obtain a limit shape of our noncolliding walks and also show that their local statistics are governed by the incomplete beta kernel. The latter is a distinguished translation invariant ergodic extension of the two-dimensional discrete sine kernel.

     
    more » « less
  3. Abstract

    A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin–Sokal parsimony (as well as an approach based on MDC) typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e., local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100,000 parsimony-informative RIs. We found that, when given just 1000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e., clades separated by branches $>0.3$ coalescent units) with high support and identified rapid radiations (i.e., shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios. [Coalescence; incomplete lineage sorting; Laurasiatheria; Palaeognathae; parsimony; polymorphism parsimony; retroelement insertions; species trees; transposon.]

     
    more » « less
  4. Abstract Consider a set of n vertices, where each vertex has a location in $\mathbb{R}^d$ that is sampled uniformly from the unit cube in $\mathbb{R}^d$ , and a weight associated to it. Construct a random graph by placing edges independently for each vertex pair with a probability that is a function of the distance between the locations and the vertex weights. Under appropriate integrability assumptions on the edge probabilities that imply sparseness of the model, after appropriately blowing up the locations, we prove that the local limit of this random graph sequence is the (countably) infinite random graph on $\mathbb{R}^d$ with vertex locations given by a homogeneous Poisson point process, having weights which are independent and identically distributed copies of limiting vertex weights. Our set-up covers many sparse geometric random graph models from the literature, including geometric inhomogeneous random graphs (GIRGs), hyperbolic random graphs, continuum scale-free percolation, and weight-dependent random connection models. We prove that the limiting degree distribution is mixed Poisson and the typical degree sequence is uniformly integrable, and we obtain convergence results on various measures of clustering in our graphs as a consequence of local convergence. Finally, as a byproduct of our argument, we prove a doubly logarithmic lower bound on typical distances in this general setting. 
    more » « less
  5. In this paper, we explore the interplay of virus contact rate, virus production rates, and initial viral load during early HIV infection. First, we consider an early HIV infection model formulated as a bivariate branching process and provide conditions for its criticalityR0 > 1. Using dimensionless rates, we show that the criticality conditionR0 > 1 defines a threshold on the target cell infection rate in terms of the infected cell removal rate and virus production rate. This result has motivated us to introduce two additional models of early HIV infection under the assumption that the virus contact rate is proportional to the target cell infection probability (denoted by). Using the second model, we show that the length of the eclipse phase of a newly infected host depends on the target cell infection probability, and the corresponding deterministic equations exhibit bistability. Indeed, occurrence of viral invasion in the deterministic dynamics depends onR0and the initial viral loadV0. If the viral load is small enough, eg,V0 ≪ θ, then there will be extinction regardless of the value ofR0. On the other hand, if the viral load is large enough, eg,V0 ≫ θandR0 > 1, then there will be infection. Of note,V0θcorresponds to a threshold regime above which virus can invade. Finally, we briefly discuss between‐cell competition of viral strains using a third model. Our findings may help explain the HIV population bottlenecks during within‐host progression and host‐to‐host transmission.

     
    more » « less