skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Asymptotic Convergence Rates of the Length of the Longest Run(s) in an Inflating Bernoulli Net
In image detection, one problem is to test whether the set, though mainly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve, for example, a curve with $$C^\alpha$$-norm bounded by $$\beta$$. One approach is to analyze the data by counting membership in multiscale multianisotropic strips, which involves an algorithm that delves into the length of the path connecting many consecutive “significant” nodes. In this paper, we develop the mathematical formalism of this algorithm and analyze the statistical property of the length of the longest significant run. The rate of convergence is derived. Using percolation theory and random graph theory, we present a novel probabilistic model named, pseudo-tree model. Based on the asymptotic results for the pseudo-tree model, we further study the length of the longest significant run in an “inflating” Bernoulli net. We find that the probability parameter $$p$$ of significant node plays an important role: there is a threshold $$p_c$$, such that in the cases of $$p < p_c$$ and $$p > p_c$$, very different asymptotic behaviors of the length of the significant runs are observed. We apply our results to the detection of an underlying curvilinear feature and prove that the test based on our proposed longest run theory is asymptotically powerful.  more » « less
Award ID(s):
2015363
PAR ID:
10285153
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE transactions on information theory
ISSN:
1557-9654
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Given a K-vertex simplex in a d-dimensional space, suppose we measure n points on the simplex with noise (hence, some of the observed points fall outside the sim- plex). Vertex hunting is the problem of estimating the K vertices of the simplex. A popular vertex hunting algorithm is successive projection algorithm (SPA). How- ever, SPA is observed to perform unsatisfactorily under strong noise or outliers. We propose pseudo-point SPA (pp-SPA). It uses a projection step and a denoise step to generate pseudo-points and feed them into SPA for vertex hunting. We derive error bounds for pp-SPA, leveraging on extreme value theory of (possibly) high-dimensional random vectors. The results suggest that pp-SPA has faster rates and better numerical performances than SPA. Our analysis includes an improved non-asymptotic bound for the original SPA, which is of independent interest. 
    more » « less
  2. A problem extension of the longest common sub-string (LCS) between two texts is the enumeration of all LCSs given a minimum length k (ALCS-k), along with their positions in each text. In bioinformatics, an efficient solution to the ALCS- k for very long texts -genomes or metagenomes- can provide useful insights to discover genetic signatures responsible for biological mechanisms. The ALCS-k problem has two additional requirements compared to the LCS problem: one is the minimum length k , and the other is that all common strings longer than k must be reported. We present an efficient, two-stage ALCS-k algorithm exploiting the spectrum of text substrings of length k (k-mers). Our approach yields a worst-case time complexity loglinear in the number of k-mers for the first stage, and an average-case loglinear in the number of common k-mers for the second stage (several orders of magnitudes smaller than the total k-mer spectrum). The space complexity is linear in the first phase (disk-based), and on average linear in the second phase (disk- and memory-based). Tests performed on genomes for different organisms (including viruses, bacteria and animal chromosomes) show that run times are consistent with our theoretical estimates; further, comparisons with MUMmer4 show an asymptotic advantage with divergent genomes. 
    more » « less
  3. Abstract We have used kymograph analysis combined with edge detection and an automated computational algorithm to analyze the axonal transport kinetics of neurofilament polymers in cultured neurons at 30 ms temporal resolution. We generated 301 kymographs from 136 movies and analyzed 726 filaments ranging from 0.6 to 42 µm in length, representing ∼37,000 distinct moving and pausing events. We found that the movement is even more intermittent than previously reported and that the filaments undergo frequent, often transient, reversals which suggest that they can engage simultaneously with both anterograde and retrograde motors. Average anterograde and retrograde bout velocities (0.9 and 1.2 µm s−1, respectively) were faster than previously reported, with maximum sustained bout velocities of up to 6.6 and 7.8 µm s−1, respectively. Average run lengths (∼1.1 µm) and run times (∼1.4 s) were in the range reported for molecular motor processivity in vitro, suggesting that the runs could represent the individual processive bouts of the neurofilament motors. Notably, we found no decrease in run velocity, run length or run time with increasing filament length, which suggests that either the drag on the moving filaments is negligible or that longer filaments recruit more motors. 
    more » « less
  4. We consider the problem of computing r-th order statistics, namely finding an assignment having rank r in a probabilistic graphical model. We show that the problem is NP-hard even when the graphical model has no edges (zero-treewidth models) via a reduction from the partition problem. We use this reduction, specifically a pseudo-polynomial time algorithm for number partitioning to yield a pseudo-polynomial time approximation algorithm for solving the r-th order statistics problem in zero- treewidth models. We then extend this algorithm to arbitrary graphical models by generalizing it to tree decompositions, and demonstrate via experimental evaluation on various datasets that our proposed algorithm is more accurate than sampling algorithms. 
    more » « less
  5. Let f be a drawing in the Euclidean plane of a graph G, which is understood to be a 1-dimensional simplicial complex. We assume that every edge of G is drawn by f as a curve of constant algebraic complexity, and the ratio of the length of the longest simple path to the the length of the shortest edge is poly(n). In the drawing f, a path P of G, or its image in the drawing π = f(P), is β-stretch if π is a simple (non-self-intersecting) curve, and for every pair of distinct points p ∈ P and q ∈ P , the length of the sub-curve of π connecting f(p) with f(q) is at most β∥f(p) − f(q)∥, where ∥.∥ denotes the Euclidean distance. We introduce and study the β-stretch Path Problem (βSP for short), in which we are given a pair of vertices s and t of G, and we are to decide whether in the given drawing of G there exists a β-stretch path P connecting s and t. We also output P if it exists. The βSP quantifies a notion of “near straightness” for paths in a graph G, motivated by gerrymandering regions in a map, where edges of G represent natural geographical/political boundaries that may be chosen to bound election districts. The notion of a β-stretch path naturally extends to cycles, and the extension gives a measure of how gerrymandered a district is. Furthermore, we show that the extension is closely related to several studied measures of local fatness of geometric shapes. We prove that βSP is strongly NP-complete. We complement this result by giving a quasi-polynomial time algorithm, that for a given ε > 0, β ∈ O(poly(log |V (G)|)), and s, t ∈ V (G), outputs a β-stretch path between s and t, if a (1 − ε)β-stretch path between s and t exists in the drawing. 
    more » « less