skip to main content


Title: A Quotient Space Formulation for Generative Statistical Analysis of Graphical Data
Complex analyses involving multiple, dependent random quantities often lead to graphical models—a set of nodes denoting variables of interest, and corresponding edges denoting statistical interactions between nodes. To develop statistical analyses for graphical data, especially towards generative modeling, one needs mathematical representations and metrics for matching and comparing graphs, and subsequent tools, such as geodesics, means, and covariances. This paper utilizes a quotient structure to develop efficient algorithms for computing these quantities, leading to useful statistical tools, including principal component analysis, statistical testing, and modeling. We demonstrate the efficacy of this framework using datasets taken from several problem areas, including letters, biochemical structures, and social networks.  more » « less
Award ID(s):
1956050
NSF-PAR ID:
10278120
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of mathematical imaging and vision
Volume:
63
ISSN:
1573-7683
Page Range / eLocation ID:
735–752
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Precise and accurate estimates of abundance and demographic rates are primary quantities of interest within wildlife conservation and management. Such quantities provide insight into population trends over time and the associated underlying ecological drivers of the systems. This information is fundamental in managing ecosystems, assessing species conservation status and developing and implementing effective conservation policy. Observational monitoring data are typically collected on wildlife populations using an array of different survey protocols, dependent on the primary questions of interest. For each of these survey designs, a range of advanced statistical techniques have been developed which are typically well understood. However, often multiple types of data may exist for the same population under study. Analyzing each data set separately implicitly discards the common information contained in the other data sets. An alternative approach that aims to optimize the shared information contained within multiple data sets is to use a “model-based data integration” approach, or more commonly referred to as an “integrated model.” This integrated modeling approach simultaneously analyzes all the available data within a single, and robust, statistical framework. This paper provides a statistical overview of ecological integrated models, with a focus on integrated population models (IPMs) which include abundance and demographic rates as quantities of interest. Four main challenges within this area are discussed, namely model specification, computational aspects, model assessment and forecasting. This should encourage researchers to explore further and develop new practical tools to ensure that full utility can be made of IPMs for future studies.

     
    more » « less
  2. Genomics has grown exponentially over the last decade. Common variants are associated with physiological changes through statistical strategies such as Genome-Wide Association Studies (GWAS) and quantitative trail loci (QTL). Rare variants are associated with diseases through extensive filtering tools, including population genomics and trio-based sequencing (parents and probands). However, the genomic associations require follow-up analyses to narrow causal variants, identify genes that are influenced, and to determine the physiological changes. Large quantities of data exist that can be used to connect variants to gene changes, cell types, protein pathways, clinical phenotypes, and animal models that establish physiological genomics. This data combined with bioinformatics including evolutionary analysis, structural insights, and gene regulation can yield testable hypotheses for mechanisms of genomic variants. Molecular biology, biochemistry, cell culture, CRISPR editing, and animal models can test the hypotheses to give molecular variant mechanisms. Variant characterizations can be a significant component of educating future professionals at the undergraduate, graduate, or medical training programs through teaching the basic concepts and terminology of genetics while learning independent research hypothesis design. This article goes through the computational and experimental analysis strategies of variant characterization and provides examples of these tools applied in publications. © 2022 American Physiological Society. Compr Physiol 12:3303-3336, 2022. 
    more » « less
  3. Abstract This work seeks to remedy two deficiencies in the current nucleic acid nanotechnology software environment: the lack of both a fast and user-friendly visualization tool and a standard for structural analyses of simulated systems. We introduce here oxView, a web browser-based visualizer that can load structures with over 1 million nucleotides, create videos from simulation trajectories, and allow users to perform basic edits to DNA and RNA designs. We additionally introduce open-source software tools for extracting common structural parameters to characterize large DNA/RNA nanostructures simulated using the coarse-grained modeling tool, oxDNA, which has grown in popularity in recent years and is frequently used to prototype new nucleic acid nanostructural designs, model biophysics of DNA/RNA processes, and rationalize experimental results. The newly introduced software tools facilitate the computational characterization of DNA/RNA designs by providing multiple analysis scripts, including mean structures and structure flexibility characterization, hydrogen bond fraying, and interduplex angles. The output of these tools can be loaded into oxView, allowing users to interact with the simulated structure in a 3D graphical environment and modify the structures to achieve the required properties. We demonstrate these newly developed tools by applying them to design and analysis of a range of DNA/RNA nanostructures. 
    more » « less
  4. Abstract Hard-to-predict bursts of COVID-19 pandemic revealed significance of statistical modeling which would resolve spatio-temporal correlations over geographical areas, for example spread of the infection over a city with census tract granularity. In this manuscript, we provide algorithmic answers to the following two inter-related public health challenges of immense social impact which have not been adequately addressed (1) Inference Challenge assuming that there are N census blocks (nodes) in the city, and given an initial infection at any set of nodes, e.g. any N of possible single node infections, any $$N(N-1)/2$$ N ( N - 1 ) / 2 of possible two node infections, etc, what is the probability for a subset of census blocks to become infected by the time the spread of the infection burst is stabilized? (2) Prevention Challenge What is the minimal control action one can take to minimize the infected part of the stabilized state footprint? To answer the challenges, we build a Graphical Model of pandemic of the attractive Ising (pair-wise, binary) type, where each node represents a census tract and each edge factor represents the strength of the pairwise interaction between a pair of nodes, e.g. representing the inter-node travel, road closure and related, and each local bias/field represents the community level of immunization, acceptance of the social distance and mask wearing practice, etc. Resolving the Inference Challenge requires finding the Maximum-A-Posteriory (MAP), i.e. most probable, state of the Ising Model constrained to the set of initially infected nodes. (An infected node is in the $$+ \, 1$$ + 1 state and a node which remained safe is in the $$- \, 1$$ - 1 state.) We show that almost all attractive Ising Models on dense graphs result in either of the two possibilities (modes) for the MAP state: either all nodes which were not infected initially became infected, or all the initially uninfected nodes remain uninfected (susceptible). This bi-modal solution of the Inference Challenge allows us to re-state the Prevention Challenge as the following tractable convex programming : for the bare Ising Model with pair-wise and bias factors representing the system without prevention measures, such that the MAP state is fully infected for at least one of the initial infection patterns, find the closest, for example in $$l_1$$ l 1 , $$l_2$$ l 2 or any other convexity-preserving norm, therefore prevention-optimal, set of factors resulting in all the MAP states of the Ising model, with the optimal prevention measures applied, to become safe. We have illustrated efficiency of the scheme on a quasi-realistic model of Seattle. Our experiments have also revealed useful features, such as sparsity of the prevention solution in the case of the $$l_1$$ l 1 norm, and also somehow unexpected features, such as localization of the sparse prevention solution at pair-wise links which are NOT these which are most utilized/traveled. 
    more » « less
  5. Abstract

    Comprehensive and accurate analysis of respiratory and metabolic data is crucial to modelling congenital, pathogenic and degenerative diseases converging on autonomic control failure. A lack of tools for high‐throughput analysis of respiratory datasets remains a major challenge. We present Breathe Easy, a novel open‐source pipeline for processing raw recordings and associated metadata into operative outcomes, publication‐worthy graphs and robust statistical analyses including QQ and residual plots for assumption queries and data transformations. This pipeline uses a facile graphical user interface for uploading data files, setting waveform feature thresholds and defining experimental variables. Breathe Easy was validated against manual selection by experts, which represents the current standard in the field. We demonstrate Breathe Easy's utility by examining a 2‐year longitudinal study of an Alzheimer's disease mouse model to assess contributions of forebrain pathology in disordered breathing. Whole body plethysmography has become an important experimental outcome measure for a variety of diseases with primary and secondary respiratory indications. Respiratory dysfunction, while not an initial symptom in many of these disorders, often drives disability or death in patient outcomes. Breathe Easy provides an open‐source respiratory analysis tool for all respiratory datasets and represents a necessary improvement upon current analytical methods in the field.image

    Key points

    Respiratory dysfunction is a common endpoint for disability and mortality in many disorders throughout life.

    Whole body plethysmography in rodents represents a high face‐value method for measuring respiratory outcomes in rodent models of these diseases and disorders.

    Analysis of key respiratory variables remains hindered by manual annotation and analysis that leads to low throughput results that often exclude a majority of the recorded data.

    Here we present a software suite, Breathe Easy, that automates the process of data selection from raw recordings derived from plethysmography experiments and the analysis of these data into operative outcomes and publication‐worthy graphs with statistics.

    We validate Breathe Easy with a terabyte‐scale Alzheimer's dataset that examines the effects of forebrain pathology on respiratory function over 2 years of degeneration.

     
    more » « less