Abstract As the serial section community transitions to volume electron microscopy, tools are needed to balance rapid segmentation efforts with documenting the fine detail of structures that support cell function. New annotation applications should be accessible to users and meet the needs of the neuroscience and connectomics communities while also being useful across other disciplines. Issues not currently addressed by a single, modern annotation application include: 1) built-in curation systems with utilities for expert intervention to provide quality assurance, 2) integrated alignment features that allow for image registration on-the-fly as image flaws are discovered during annotation, 3) simplicity for non-specialists within and beyond the neuroscience community, 5) a system to store experimental meta-data with annotation data in a way that researchers remain masked regarding condition to avoid potential biases, 6) local management of large datasets, 7) fully open-source codebase allowing development of new tools, and more. Here, we present PyReconstruct, a modern successor to the Reconstruct annotation tool. PyReconstruct operates in a field-agnostic manner, runs on all major operating systems, breaks through legacy RAM limitations, features an intuitive and collaborative curation system, and employs a flexible and dynamic approach to image registration. It can be used to analyze, display, and publish experimental or connectomics data. PyReconstruct is suited for generating ground truth to implement in automated segmentation, outcomes of which can be returned to PyReconstruct for proofreading and quality control. Significance statementIn neuroscience, the emerging field of connectomics has produced annotation tools for reconstruction that prioritize circuit connectivity across microns to centimeters and farther. Determining the strength of synapses forming the connections is crucial to understand function and requires quantification of their nanoscale dimensions and subcellular composition. PyReconstruct, successor to the early annotation tool Reconstruct, meets these requirements for synapses and other structures well beyond neuroscience. PyReconstruct lifts many restrictions of legacy Reconstruct and offers a user-friendly interface, integrated curation, dynamic alignment, nanoscale quantification, 3D visualization, and more. Extensive compatibility with third-party software provides access to the expanding tools from the connectomics and imaging communities.
more »
« less
A Survey of Visualization and Analysis in High‐Resolution Connectomics
Abstract The field of connectomics aims to reconstruct the wiring diagram of Neurons and synapses to enable new insights into the workings of the brain. Reconstructing and analyzing the Neuronal connectivity, however, relies on many individual steps, starting from high‐resolution data acquisition to automated segmentation, proofreading, interactive data exploration, and circuit analysis. All of these steps have to handle large and complex datasets and rely on or benefit from integrated visualization methods. In this state‐of‐the‐art report, we describe visualization methods that can be applied throughout the connectomics pipeline, from data acquisition to circuit analysis. We first define the different steps of the pipeline and focus on how visualization is currently integrated into these steps. We also survey open science initiatives in connectomics, including usable open‐source tools and publicly available datasets. Finally, we discuss open challenges and possible future directions of this exciting research field.
more »
« less
- PAR ID:
- 10406071
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Computer Graphics Forum
- Volume:
- 41
- Issue:
- 3
- ISSN:
- 0167-7055
- Format(s):
- Medium: X Size: p. 573-607
- Size(s):
- p. 573-607
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In the realm of neuroscience, mapping the three-dimensional (3D) neural circuitry and architecture of the brain is important for advancing our understanding of neural circuit organization and function. This study presents a novel pipeline that transforms mouse brain samples into detailed 3D brain models using a collaborative data analytics platform called “Texera.” The user-friendly Texera platform allows for effective interdisciplinary collaboration between team members in neuroscience, computer vision, and data processing. Our pipeline utilizes the tile images from a serial two-photon tomography/TissueCyte system, then stitches tile images into brain section images, and constructs 3D whole-brain image datasets. The resulting 3D data supports downstream analyses, including 3D whole-brain registration, atlas-based segmentation, cell counting, and high-resolution volumetric visualization. Using this platform, we implemented specialized optimization methods and obtained significant performance enhancement in workflow operations. We expect the neuroscience community can adopt our approach for large-scale image-based data processing and analysis.more » « less
-
Mass spectrometry is the dominant technology in the field of proteomics, enabling high-throughput analysis of the protein content of complex biological samples. Due to the complexity of the instrumentation and resulting data, sophisticated computational methods are required for the processing and interpretation of acquired mass spectra. Machine learning has shown great promise to improve the analysis of mass spectrometry data, with numerous purpose-built methods for improving specific steps in the data acquisition and analysis pipeline reaching widespread adoption. Here, we propose unifying various spectrum prediction tasks under a single foundation model for mass spectra. To this end, we pre-train a spectrum encoder using de novo sequencing as a pre-training task. We then show that using these pre-trained spectrum representations improves our performance on the four downstream tasks of spectrum quality prediction, chimericity prediction, phosphorylation prediction, and glycosylation status prediction. Finally, we perform multi-task fine-tuning and find that this approach improves the performance on each task individually. Overall, our work demonstrates that a foundation model for tandem mass spectrometry proteomics trained on de novo sequencing learns generalizable representations of spectra, improves performance on downstream tasks where training data is limited, and can ultimately enhance data acquisition and analysis in proteomics experiments.more » « less
-
Abstract PremiseAmong the slowest steps in the digitization of natural history collections is converting imaged labels into digital text. We present here a working solution to overcome this long‐recognized efficiency bottleneck that leverages synergies between community science efforts and machine learning approaches. MethodsWe present two new semi‐automated services. The first detects and classifies typewritten, handwritten, or mixed labels from herbarium sheets. The second uses a workflow tuned for specimen labels to label text using optical character recognition (OCR). The label finder and classifier was built via humans‐in‐the‐loop processes that utilize the community science Notes from Nature platform to develop training and validation data sets to feed into a machine learning pipeline. ResultsOur results showcase a >93% success rate for finding and classifying main labels. The OCR pipeline optimizes pre‐processing, multiple OCR engines, and post‐processing steps, including an alignment approach borrowed from molecular systematics. This pipeline yields >4‐fold reductions in errors compared to off‐the‐shelf open‐source solutions. The OCR workflow also allows human validation using a custom Notes from Nature tool. DiscussionOur work showcases a usable set of tools for herbarium digitization including a custom‐built web application that is freely accessible. Further work to better integrate these services into existing toolkits can support broad community use.more » « less
-
McHenry, K; Schreiber, L (Ed.)The paleogeosciences are becoming more and more interdisciplinary, and studies increasingly rely on large collections of data derived from multiple data repositories. Integrating diverse datasets from multiple sources into complex workflows increases the challenge of creating reproducible and open science, as data formats and tools are often noninteroperable, requiring manual manipulation of data into standardized formats, resulting in a disconnect in data provenance and confounding reproducibility. Here we present a notebook that demonstrates how the Linked PaleoData (LiPD) framework is used as an interchange format to allow data from multiple data sources to be integrated in a complex workflow using emerging packages in R for geochronological uncertainty quantification and abrupt change detection. Specifically, in this notebook, we use the neotoma2 and lipdR packages to access paleoecological data from the Neotoma Database, and paleoclimate data from compilations hosted on Lipdverse. Age uncertainties for datasets from both sources are then quantified using the geoChronR package, and those data, and their associated age uncertainties, are then investigated for abrupt changes using the actR package, with accompanying visualizations. The result is an integrated, reproducible workflow in R that demonstrates how this complex series of multisource data integration, analysis and visualization can be integrated into an efficient, open scientific narrative.more » « less
An official website of the United States government
