skip to main content


Search for: All records

Award ID contains: 1912194

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Topological Data Analysis is a growing area of data science, which aims at computing and characterizing the geometry and topology of data sets, in order to produce useful descriptors for subsequent statistical and machine learning tasks. Its main computational tool is persistent homology, which amounts to track the topological changes in growing families of subsets of the data set itself, called filtrations, and encode them in an algebraic object, called persistence module. Even though algorithms and theoretical properties of modules are now well-known in the single-parameter case, that is, when there is only one filtration to study, much less is known in the multi-parameter case, where several filtrations are given at once. Though more complicated, the resulting persistence modules are usually richer and encode more information, making them better descriptors for data science. In this article, we present the first approximation scheme, which is based on fibered barcodes and exact matchings, two constructions that stem from the theory of single-parameter persistence, for computing and decomposing general multi-parameter persistence modules. Our algorithm has controlled complexity and running time, and works in arbitrary dimension, i.e., with an arbitrary number of filtrations. Moreover, when restricting to specific classes of multi-parameter persistence modules, namely the ones that can be decomposed into intervals, we establish theoretical results about the approximation error between our estimate and the true module in terms of interleaving distance. Finally, we present empirical evidence validating output quality and speed-up on several data sets. 
    more » « less
  2. de Bruijne, M. (Ed.)
    Spatial transcriptomics techniques such as STARmap [15] enable the subcellular detection of RNA transcripts within complex tissue sections. The data from these techniques are impacted by optical microscopy limitations, such as shading or vignetting effects from uneven illumination during image capture. Downstream analysis of these sparse spatially resolved transcripts is dependent upon the correction of these artefacts. This paper introduces a novel non-parametric vignetting correction tool for spatial transcriptomic images, which estimates the illumination field and background using an efficient iterative sliced histogram normalization routine. We show that our method outperforms the state-of-the-art shading correction techniques both in terms of illumination and background field estimation and requires fewer input images to perform the estimation adequately. We further demonstrate an important downstream application of our technique, showing that spatial transcriptomic volumes corrected by our method yield a higher and more uniform gene expression spot-calling in the rodent hippocampus. Python code and a demo file to reproduce our results are provided in the supplementary material and at this github page: https://github.com/BoveyRao/Non-parametric-vc-for-sparse-st. 
    more » « less
  3. null (Ed.)
    Sex differences in the brain are prevalent throughout the animal kingdom and particularly well appreciated in the nematode C. elegans, where male animals contain a little studied set of 93 male-specific neurons. To make these neurons amenable for future study, we describe here how a multicolor reporter transgene, NeuroPAL, is capable of visualizing the distinct identities of all male specific neurons. We used NeuroPAL to visualize and characterize a number of features of the male-specific nervous system. We provide several proofs of concept for using NeuroPAL to identify the sites of expression of gfp-tagged reporter genes and for cellular fate analysis by analyzing the effect of removal of several developmental patterning genes on neuronal identity acquisition. We use NeuroPAL and its intrinsic cohort of more than 40 distinct differentiation markers to show that, even though male-specific neurons are generated throughout all four larval stages, they execute their terminal differentiation program in a coordinated manner in the fourth larval stage. This coordinated wave of differentiation, which we call “just-in-time" differentiation, couples neuronal maturation programs with the appearance of sexual organs. 
    more » « less
  4. null (Ed.)
  5. null (Ed.)
    Multi-electrode arrays such as "Neuropixels" probes enable the study of neuronal voltage signals at high temporal and single-cell spatial resolution. However, in vivo recordings from these devices often experience some shifting of the probe (due e.g. to animal movement), resulting in poorly localized voltage readings that in turn can corrupt estimates of neural activity. We introduce a new registration method to partially correct for this motion. In contrast to previous template-based registration methods, the proposed approach is decentralized, estimating shifts of the data recorded in multiple timebins with respect to one another, and then extracting a global registration estimate from the resulting estimated shift matrix. We find that the resulting decentralized registration is more robust and accurate than previous template-based approaches applied to both simulated and real data, but nonetheless some significant non-stationarity in the recovered neural activity remains that should be accounted for by downstream processing pipelines. Open source code is available at https://github.com/evarol/NeuropixelsRegistration. 
    more » « less
  6. null (Ed.)
    We propose methods for estimating correspondence between two point sets under the presence of outliers in both the source and target sets. The proposed algorithms expand upon the theory of the regression without correspondence problem to estimate transformation coefficients using unordered multisets of covariates and responses. Previous theoretical analysis of the problem has been done in a setting where the responses are a complete permutation of the regressed covariates. This paper expands the problem setting by analyzing the cases where only a subset of the responses is a permutation of the regressed covariates in addition to some covariates possibly being adversarial outliers. We term this problem robust regression without correspondence and provide several algorithms based on random sample consensus for exact and approximate recovery in a noiseless and noisy one-dimensional setting as well as an approximation algorithm for multiple dimensions. The theoretical guarantees of the algorithms are verified in simulated data. We demonstrate an important computational neuroscience application of the proposed framework by demonstrating its effectiveness in a Caenorhabditis elegans neuron matching problem where the presence of outliers in both the source and target nematodes is a natural tendency 
    more » « less
  7. The Čech and Rips constructions of persistent homology are stable with respect to perturbations of the input data. However, neither is robust to outliers, and both can be insensitive to topological structure of high-density regions of the data. A natural solution is to consider 2-parameter persistence. This paper studies the stability of 2-parameter persistent homology: We show that several related density-sensitive constructions of bifiltrations from data satisfy stability properties accommodating the addition and removal of outliers. Specifically, we consider the multicover bifiltration, Sheehy's subdivision bifiltrations, and the degree bifiltrations. For the multicover and subdivision bifiltrations, we get 1-Lipschitz stability results closely analogous to the standard stability results for 1-parameter persistent homology. Our results for the degree bifiltrations are weaker, but they are tight, in a sense. As an application of our theory, we prove a law of large numbers for subdivision bifiltrations of random data. 
    more » « less
  8. Larochelle, H. ; Ranzato, M. ; Hadsell, R. ; Balcan, M.F. ; Lin, H. (Ed.)
    In the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks. 
    more » « less