skip to main content


Title: A Survey of Visualization and Analysis in High‐Resolution Connectomics
Abstract

The field of connectomics aims to reconstruct the wiring diagram of Neurons and synapses to enable new insights into the workings of the brain. Reconstructing and analyzing the Neuronal connectivity, however, relies on many individual steps, starting from high‐resolution data acquisition to automated segmentation, proofreading, interactive data exploration, and circuit analysis. All of these steps have to handle large and complex datasets and rely on or benefit from integrated visualization methods. In this state‐of‐the‐art report, we describe visualization methods that can be applied throughout the connectomics pipeline, from data acquisition to circuit analysis. We first define the different steps of the pipeline and focus on how visualization is currently integrated into these steps. We also survey open science initiatives in connectomics, including usable open‐source tools and publicly available datasets. Finally, we discuss open challenges and possible future directions of this exciting research field.

 
more » « less
Award ID(s):
2124179 1650499
NSF-PAR ID:
10406071
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Computer Graphics Forum
Volume:
41
Issue:
3
ISSN:
0167-7055
Page Range / eLocation ID:
p. 573-607
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Comprehensive and accurate analysis of respiratory and metabolic data is crucial to modelling congenital, pathogenic and degenerative diseases converging on autonomic control failure. A lack of tools for high‐throughput analysis of respiratory datasets remains a major challenge. We present Breathe Easy, a novel open‐source pipeline for processing raw recordings and associated metadata into operative outcomes, publication‐worthy graphs and robust statistical analyses including QQ and residual plots for assumption queries and data transformations. This pipeline uses a facile graphical user interface for uploading data files, setting waveform feature thresholds and defining experimental variables. Breathe Easy was validated against manual selection by experts, which represents the current standard in the field. We demonstrate Breathe Easy's utility by examining a 2‐year longitudinal study of an Alzheimer's disease mouse model to assess contributions of forebrain pathology in disordered breathing. Whole body plethysmography has become an important experimental outcome measure for a variety of diseases with primary and secondary respiratory indications. Respiratory dysfunction, while not an initial symptom in many of these disorders, often drives disability or death in patient outcomes. Breathe Easy provides an open‐source respiratory analysis tool for all respiratory datasets and represents a necessary improvement upon current analytical methods in the field.image

    Key points

    Respiratory dysfunction is a common endpoint for disability and mortality in many disorders throughout life.

    Whole body plethysmography in rodents represents a high face‐value method for measuring respiratory outcomes in rodent models of these diseases and disorders.

    Analysis of key respiratory variables remains hindered by manual annotation and analysis that leads to low throughput results that often exclude a majority of the recorded data.

    Here we present a software suite, Breathe Easy, that automates the process of data selection from raw recordings derived from plethysmography experiments and the analysis of these data into operative outcomes and publication‐worthy graphs with statistics.

    We validate Breathe Easy with a terabyte‐scale Alzheimer's dataset that examines the effects of forebrain pathology on respiratory function over 2 years of degeneration.

     
    more » « less
  2. As connectomic datasets exceed hundreds of terabytes in size, accurate and efficient skeleton generation of the label volumes has evolved into a critical component of the computation pipeline used for analysis, evaluation, visualization, and error correction. We propose a novel topological thinning strategy that uses biological constraints to produce accurate centerlines from segmented neuronal volumes while still maintaining bio- logically relevant properties. Current methods are either agnostic to the underlying biology, have non-linear running times as a function of the number of input voxels, or both. First, we eliminate from the input segmentation biologically-infeasible bubbles, pockets of voxels incorrectly labeled within a neuron, to improve segmentation accuracy, allow for more accurate centerlines, and increase processing speed. Next, a Convolutional Neural Network (CNN) detects cell bodies from the input segmentation, allowing us to anchor our skeletons to the somata. Lastly, a synapse-aware topological thinning approach produces expressive skeletons for each neuron with a nearly one-to-one correspondence between endpoints and synapses. We simultaneously estimate geometric properties of neurite width and geodesic distance between synapse and cell body, improving accuracy by 47.5% and 62.8% over baseline methods. We separate the skeletonization process into a series of computation steps, leveraging data-parallel strategies to increase throughput significantly. We demonstrate our results on over 1250 neurons and neuron fragments from three different species, processing over one million voxels per second per CPU with linear scalability. 
    more » « less
  3. Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.

     
    more » « less
  4. Abstract Motivation

    MicroRNAs (miRNAs) are small RNA molecules (∼22 nucleotide long) involved in post-transcriptional gene regulation. Advances in high-throughput sequencing technologies led to the discovery of isomiRs, which are miRNA sequence variants. While many miRNA-seq analysis tools exist, the diversity of output formats hinders accurate comparisons between tools and precludes data sharing and the development of common downstream analysis methods.

    Results

    To overcome this situation, we present here a community-based project, miRNA Transcriptomic Open Project (miRTOP) working towards the optimization of miRNA analyses. The aim of miRTOP is to promote the development of downstream isomiR analysis tools that are compatible with existing detection and quantification tools. Based on the existing GFF3 format, we first created a new standard format, mirGFF3, for the output of miRNA/isomiR detection and quantification results from small RNA-seq data. Additionally, we developed a command line Python tool, mirtop, to create and manage the mirGFF3 format. Currently, mirtop can convert into mirGFF3 the outputs of commonly used pipelines, such as seqbuster, isomiR-SEA, sRNAbench, Prost! as well as BAM files. Some tools have also incorporated the mirGFF3 format directly into their code, such as, miRge2.0, IsoMIRmap and OptimiR. Its open architecture enables any tool or pipeline to output or convert results into mirGFF3. Collectively, this isomiR categorization system, along with the accompanying mirGFF3 and mirtop API, provide a comprehensive solution for the standardization of miRNA and isomiR annotation, enabling data sharing, reporting, comparative analyses and benchmarking, while promoting the development of common miRNA methods focusing on downstream steps of miRNA detection, annotation and quantification.

    Availability and implementation

    https://github.com/miRTop/mirGFF3/ and https://github.com/miRTop/mirtop.

    Contact

    desvignes@uoneuro.uoregon.edu or lpantano@iscb.org

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract

    Pressing environmental research questions demand the integration of increasingly diverse and large‐scale ecological datasets as well as complex analytical methods, which require specialized tools and resources.

    Computational training for ecological and evolutionary sciences has become more abundant and accessible over the past decade, but tool development has outpaced the availability of specialized training. Most training for scripted analyses focuses on individual analysis steps in one script rather than creating a scripted pipeline, where modular functions comprise an ecosystem of interdependent steps. Although current computational training creates an excellent starting place, linear styles of scripting can risk becoming labor‐ and time‐intensive and less reproducible by often requiring manual execution. Pipelines, however, can be easily automated or tracked by software to increase efficiency and reduce potential errors. Ecology and evolution would benefit from techniques that reduce these risks by managing analytical pipelines in a modular, readily parallelizable format with clear documentation of dependencies.

    Workflow management software (WMS) can aid in the reproducibility, intelligibility and computational efficiency of complex pipelines. To date, WMS adoption in ecology and evolutionary research has been slow. We discuss the benefits and challenges of implementing WMS and illustrate its use through a case study with thetargets rpackage to further highlight WMS benefits through workflow automation, dependency tracking and improved clarity for reviewers.

    Although WMS requires familiarity with function‐oriented programming and careful planning for more advanced applications and pipeline sharing, investment in training will enable access to the benefits of WMS and impart transferable computing skills that can facilitate ecological and evolutionary data science at large scales.

     
    more » « less