skip to main content

Title: A Quotient Space Formulation for Generative Statistical Analysis of Graphical Data
Complex analyses involving multiple, dependent random quantities often lead to graphical models—a set of nodes denoting variables of interest, and corresponding edges denoting statistical interactions between nodes. To develop statistical analyses for graphical data, especially towards generative modeling, one needs mathematical representations and metrics for matching and comparing graphs, and subsequent tools, such as geodesics, means, and covariances. This paper utilizes a quotient structure to develop efficient algorithms for computing these quantities, leading to useful statistical tools, including principal component analysis, statistical testing, and modeling. We demonstrate the efficacy of this framework using datasets taken from several problem areas, including letters, biochemical structures, and social networks.
; ;
Award ID(s):
Publication Date:
Journal Name:
Journal of mathematical imaging and vision
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Genomics has grown exponentially over the last decade. Common variants are associated with physiological changes through statistical strategies such as Genome-Wide Association Studies (GWAS) and quantitative trail loci (QTL). Rare variants are associated with diseases through extensive filtering tools, including population genomics and trio-based sequencing (parents and probands). However, the genomic associations require follow-up analyses to narrow causal variants, identify genes that are influenced, and to determine the physiological changes. Large quantities of data exist that can be used to connect variants to gene changes, cell types, protein pathways, clinical phenotypes, and animal models that establish physiological genomics. This datamore »combined with bioinformatics including evolutionary analysis, structural insights, and gene regulation can yield testable hypotheses for mechanisms of genomic variants. Molecular biology, biochemistry, cell culture, CRISPR editing, and animal models can test the hypotheses to give molecular variant mechanisms. Variant characterizations can be a significant component of educating future professionals at the undergraduate, graduate, or medical training programs through teaching the basic concepts and terminology of genetics while learning independent research hypothesis design. This article goes through the computational and experimental analysis strategies of variant characterization and provides examples of these tools applied in publications. © 2022 American Physiological Society. Compr Physiol 12:3303-3336, 2022.« less
  2. Abstract This work seeks to remedy two deficiencies in the current nucleic acid nanotechnology software environment: the lack of both a fast and user-friendly visualization tool and a standard for structural analyses of simulated systems. We introduce here oxView, a web browser-based visualizer that can load structures with over 1 million nucleotides, create videos from simulation trajectories, and allow users to perform basic edits to DNA and RNA designs. We additionally introduce open-source software tools for extracting common structural parameters to characterize large DNA/RNA nanostructures simulated using the coarse-grained modeling tool, oxDNA, which has grown in popularity in recent yearsmore »and is frequently used to prototype new nucleic acid nanostructural designs, model biophysics of DNA/RNA processes, and rationalize experimental results. The newly introduced software tools facilitate the computational characterization of DNA/RNA designs by providing multiple analysis scripts, including mean structures and structure flexibility characterization, hydrogen bond fraying, and interduplex angles. The output of these tools can be loaded into oxView, allowing users to interact with the simulated structure in a 3D graphical environment and modify the structures to achieve the required properties. We demonstrate these newly developed tools by applying them to design and analysis of a range of DNA/RNA nanostructures.« less
  3. There is little research or understanding of curricular differences between two- and four-year programs, career development of engineering technology (ET) students, and professional preparation for ET early career professionals [1]. Yet, ET credentials (including certificates, two-, and four-year degrees) represent over half of all engineering credentials awarded in the U.S [2]. ET professionals are important hands-on members of engineering teams who have specialized knowledge of components and engineering systems. This research study focuses on how career orientations affect engineering formation of ET students educated at two-year colleges. The theoretical framework guiding this study is Social Cognitive Career Theory (SCCT). SCCTmore »is a theory which situates attitudes, interests, and experiences and links self-efficacy beliefs, outcome expectations, and personal goals to educational and career decisions and outcomes [3]. Student knowledge of attitudes toward and motivation to pursue STEM and engineering education can impact academic performance and indicate future career interest and participation in the STEM workforce [4]. This knowledge may be measured through career orientations or career anchors. A career anchor is a combination of self-concept characteristics which includes talents, skills, abilities, motives, needs, attitudes, and values. Career anchors can develop over time and aid in shaping personal and career identity [6]. The purpose of this quantitative research study is to identify dimensions of career orientations and anchors at various educational stages to map to ET career pathways. The research question this study aims to answer is: For students educated in two-year college ET programs, how do the different dimensions of career orientations, at various phases of professional preparation, impact experiences and development of professional profiles and pathways? The participants (n=308) in this study represent three different groups: (1) students in engineering technology related programs from a medium rural-serving technical college (n=136), (2) students in engineering technology related programs from a large urban-serving technical college (n=52), and (3) engineering students at a medium Research 1 university who have transferred from a two-year college (n=120). All participants completed Schein’s Career Anchor Inventory [5]. This instrument contains 40 six-point Likert-scale items with eight subscales which correlate to the eight different career anchors. Additional demographic questions were also included. The data analysis includes graphical displays for data visualization and exploration, descriptive statistics for summarizing trends in the sample data, and then inferential statistics for determining statistical significance. This analysis examines career anchor results across groups by institution, major, demographics, types of educational experiences, types of work experiences, and career influences. This cross-group analysis aids in the development of profiles of values, talents, abilities, and motives to support customized career development tailored specifically for ET students. These findings contribute research to a gap in ET and two-year college engineering education research. Practical implications include use of findings to create career pathways mapped to career anchors, integration of career development tools into two-year college curricula and programs, greater support for career counselors, and creation of alternate and more diverse pathways into engineering. Words: 489 References [1] National Academy of Engineering. (2016). Engineering technology education in the United States. Washington, DC: The National Academies Press. [2] The Integrated Postsecondary Education Data System, (IPEDS). (2014). Data on engineering technology degrees. [3] Lent, R.W., & Brown, S.B. (1996). Social cognitive approach to career development: An overivew. Career Development Quarterly, 44, 310-321. [4] Unfried, A., Faber, M., Stanhope, D.S., Wiebe, E. (2015). The development and validation of a measure of student attitudes toward science, technology, engineeirng, and math (S-STEM). Journal of Psychoeducational Assessment, 33(7), 622-639. [5] Schein, E. (1996). Career anchors revisited: Implications for career development in the 21st century. Academy of Management Executive, 10(4), 80-88. [6] Schein, E.H., & Van Maanen, J. (2013). Career Anchors, 4th ed. San Francisco: Wiley.« less
  4. Abstract Hard-to-predict bursts of COVID-19 pandemic revealed significance of statistical modeling which would resolve spatio-temporal correlations over geographical areas, for example spread of the infection over a city with census tract granularity. In this manuscript, we provide algorithmic answers to the following two inter-related public health challenges of immense social impact which have not been adequately addressed (1) Inference Challenge assuming that there are N census blocks (nodes) in the city, and given an initial infection at any set of nodes, e.g. any N of possible single node infections, any $$N(N-1)/2$$ N ( N - 1 ) / 2 ofmore »possible two node infections, etc, what is the probability for a subset of census blocks to become infected by the time the spread of the infection burst is stabilized? (2) Prevention Challenge What is the minimal control action one can take to minimize the infected part of the stabilized state footprint? To answer the challenges, we build a Graphical Model of pandemic of the attractive Ising (pair-wise, binary) type, where each node represents a census tract and each edge factor represents the strength of the pairwise interaction between a pair of nodes, e.g. representing the inter-node travel, road closure and related, and each local bias/field represents the community level of immunization, acceptance of the social distance and mask wearing practice, etc. Resolving the Inference Challenge requires finding the Maximum-A-Posteriory (MAP), i.e. most probable, state of the Ising Model constrained to the set of initially infected nodes. (An infected node is in the $$+ \, 1$$ + 1 state and a node which remained safe is in the $$- \, 1$$ - 1 state.) We show that almost all attractive Ising Models on dense graphs result in either of the two possibilities (modes) for the MAP state: either all nodes which were not infected initially became infected, or all the initially uninfected nodes remain uninfected (susceptible). This bi-modal solution of the Inference Challenge allows us to re-state the Prevention Challenge as the following tractable convex programming : for the bare Ising Model with pair-wise and bias factors representing the system without prevention measures, such that the MAP state is fully infected for at least one of the initial infection patterns, find the closest, for example in $$l_1$$ l 1 , $$l_2$$ l 2 or any other convexity-preserving norm, therefore prevention-optimal, set of factors resulting in all the MAP states of the Ising model, with the optimal prevention measures applied, to become safe. We have illustrated efficiency of the scheme on a quasi-realistic model of Seattle. Our experiments have also revealed useful features, such as sparsity of the prevention solution in the case of the $$l_1$$ l 1 norm, and also somehow unexpected features, such as localization of the sparse prevention solution at pair-wise links which are NOT these which are most utilized/traveled.« less
  5. High dimensional piecewise stationary graphical models represent a versatile class for modelling time varying networks arising in diverse application areas, including biology, economics, and social sciences. There has been recent work in offline detection and estimation of regime changes in the topology of sparse graphical models. However, the online setting remains largely unexplored, despite its high relevance to applications in sensor networks and other engineering monitoring systems, as well as financial markets. To that end, this work introduces a novel scalable online algorithm for detecting an unknown number of abrupt changes in the inverse covariance matrix of sparse Gaussian graphicalmore »models with small delay. The proposed algorithm is based upon monitoring the conditional log-likelihood of all nodes in the network and can be extended to a large class of continuous and discrete graphical models. We also investigate asymptotic properties of our procedure under certain mild regularity conditions on the graph size, sparsity level, number of samples, and pre- and post-changes in the topology of the network. Numerical works on both synthetic and real data illustrate the good performance of the proposed methodology both in terms of computational and statistical efficiency across numerous experimental settings.« less