skip to main content


Title: A web-based system for creating, viewing, and editing precursor mass spectrometry ground truth data
Abstract Background Mass spectrometry (MS) uses mass-to-charge ratios of measured particles to decode the identities and quantities of molecules in a sample. Interpretation of raw MS depends upon data processing algorithms that render it human-interpretable. Quantitative MS workflows are complex experimental chains and it is crucial to know the performance and bias of each data processing method as they impact accuracy, coverage, and statistical significance of the result. Creation of the ground truth necessary for quantitatively evaluating MS1-aware algorithms is difficult and tedious task, and better software for creating such datasets would facilitate more extensive evaluation and improvement of MS data processing algorithms. Results We present JS-MS 2.0, a software suite that provides a dependency-free, browser-based, one click, cross-platform solution for creating MS1 ground truth. The software retains the first version’s capacity for loading, viewing, and navigating MS1 data in 2- and 3-D, and adds tools for capturing, editing, saving, and viewing isotopic envelope and extracted isotopic chromatogram features. The software can also be used to view and explore the results of feature finding algorithms. Conclusions JS-MS 2.0 enables faster creation and inspection of MS1 ground truth data. It is publicly available with an MIT license at github.com/optimusmoose/jsms.  more » « less
Award ID(s):
1723248 1723006 1723196
NSF-PAR ID:
10222832
Author(s) / Creator(s):
;
Date Published:
Journal Name:
BMC Bioinformatics
Volume:
21
Issue:
1
ISSN:
1471-2105
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Thomasson, J. Alex ; Torres-Rua, Alfonso F. (Ed.)
    sUAS (small-Unmanned Aircraft System) and advanced surface energy balance models allow detailed assessment and monitoring (at plant scale) of different (agricultural, urban, and natural) environments. Significant progress has been made in the understanding and modeling of atmosphere-plant-soil interactions and numerical quantification of the internal processes at plant scale. Similarly, progress has been made in ground truth information comparison and validation models. An example of this progress is the application of sUAS information using the Two-Source Surface Energy Balance (TSEB) model in commercial vineyards by the Grape Remote sensing Atmospheric Profile and Evapotranspiration eXperiment - GRAPEX Project in California. With advances in frequent sUAS data collection for larger areas, sUAS information processing becomes computationally expensive on local computers. Additionally, fragmentation of different models and tools necessary to process the data and validate the results is a limiting factor. For example, in the referred GRAPEX project, commercial software (ArcGIS and MS Excel) and Python and Matlab code are needed to complete the analysis. There is a need to assess and integrate research conducted with sUAS and surface energy balance models in a sharing platform to be easily migrated to high performance computing (HPC) resources. This research, sponsored by the National Science Foundation FAIR Cyber Training Fellowships, is integrating disparate software and code under a unified language (Python). The Python code for estimating the surface energy fluxes using TSEB2T model as well as the EC footprint analysis code for ground truth information comparison were hosted in myGeoHub site https://mygeohub.org/ to be reproducible and replicable. 
    more » « less
  2. Abstract Motivation

    Driven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets.

    Results

    We propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA’s tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools.

    Availability and implementation

    HOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc.

     
    more » « less
  3. The concern regarding users’ data privacy has risen to its highest level due to the massive increase in communication platforms, social networking sites, and greater users’ participation in online public discourse. An increasing number of people exchange private information via emails, text messages, and social media without being aware of the risks and implications. Researchers in the field of Natural Language Processing (NLP) have concentrated on creating tools and strategies to identify, categorize, and sanitize private information in text data since a substantial amount of data is exchanged in textual form. However, most of the detection methods solely rely on the existence of pre-identified keywords in the text and disregard the inference of underlying meaning of the utterance in a specific context. Hence, in some situations these tools and algorithms fail to detect disclosure, or the produced results are miss classified. In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns. Our goal is to better classify disclosure/non-disclosure content in terms of the context of situation. We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets. The results show that the proposed model was able to identify privacy disclosure through tweets with an accuracy of 77.4% while classifying the information type of those tweets with an impressive accuracy of 99%, by jointly learning for two separate tasks.

     
    more » « less
  4. Single-Particle Reconstruction (SPR) in Cryo-Electron Microscopy (cryo-EM) is the task of estimating the 3D structure of a molecule from a set of noisy 2D projections, taken from unknown viewing directions. Many algorithms for SPR start from an initial reference molecule, and alternate between refining the estimated viewing angles given the molecule, and refining the molecule given the viewing angles. This scheme is called iterative refinement. Reliance on an initial, user-chosen reference introduces model bias, and poor initialization can lead to slow convergence. Furthermore, since no ground truth is available for an unsolved molecule, it is difficult to validate the obtained results. This creates the need for high quality ab initio models that can be quickly obtained from experimental data with minimal priors, and which can also be used for validation. We propose a procedure to obtain such an ab initio model directly from raw data using Kam's autocorrelation method. Kam's method has been known since 1980, but it leads to an underdetermined system, with missing orthogonal matrices. Until now, this system has been solved only for special cases, such as highly symmetric molecules or molecules for which a homologous structure was already available. In this paper, we show that knowledge of just two clean projections is sufficient to guarantee a unique solution to the system. This system is solved by an optimization-based heuristic. For the first time, we are then able to obtain a low-resolution ab initio model of an asymmetric molecule directly from raw data, without 2D class averaging and without tilting. Numerical results are presented on both synthetic and experimental data. 
    more » « less
  5. Abstract

    Image texture, the relative spatial arrangement of intensity values in an image, encodes valuable information about the scene. As it stands, much of this potential information remains untapped. Understanding how to decipher textural details would afford another method of extracting knowledge of the physical world from images. In this work, we attempt to bridge the gap in research between quantitative texture analysis and the visual perception of textures. The impact of changes in image texture on human observer’s ability to perform signal detection and localization tasks in complex digital images is not understood. We examine this critical question by studying task-based human observer performance in detecting and localizing signals in tomographic breast images. We have also investigated how these changes impact the formation of second-order image texture. We used digital breast tomosynthesis (DBT) an FDA approved tomographic X-ray breast imaging method as the modality of choice to show our preliminary results. Our human observer studies involve localization ROC (LROC) studies for low contrast mass detection in DBT. Simulated images are used as they offer the benefit of known ground truth. Our results prove that changes in system geometry or processing leads to changes in image texture magnitudes. We show that the variations in several well-known texture features estimated in digital images correlate with human observer detection–localization performance for signals embedded in them. This insight can allow efficient and practical techniques to identify the best imaging system design and algorithms or filtering tools by examining the changes in these texture features. This concept linking texture feature estimates and task based image quality assessment can be extended to several other imaging modalities and applications as well. It can also offer feedback in system and algorithm designs with a goal to improve perceptual benefits. Broader impact can be in wide array of areas including imaging system design, image processing, data science, machine learning, computer vision, perceptual and vision science. Our results also point to the caution that must be exercised in using these texture features as image-based radiomic features or as predictive markers for risk assessment as they are sensitive to system or image processing changes.

     
    more » « less