skip to main content


Title: Is DIA proteomics data FAIR? Current data sharing practices, available bioinformatics infrastructure and recommendations for the future
Abstract

Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.

 
more » « less
NSF-PAR ID:
10406336
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
PROTEOMICS
Volume:
23
Issue:
7-8
ISSN:
1615-9853
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the proteomics field is now actively embracing public open data policies. At the end of June 2019, more than 14 100 datasets had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years. In parallel, an unprecedented increase of data re-use activities in the field, including ‘big data’ approaches, is enabling novel research and new data resources. At last, we also outline some of our future plans for the coming years. 
    more » « less
  2. Abstract Data-Independent Acquisition (DIA) is a method to improve consistent identification and precise quantitation of peptides and proteins by mass spectrometry (MS). The targeted data analysis strategy in DIA relies on spectral assay libraries that are generally derived from a priori measurements of peptides for each species. Although Escherichia coli ( E. coli ) is among the best studied model organisms, so far there is no spectral assay library for the bacterium publicly available. Here, we generated a spectral assay library for 4,014 of the 4,389 annotated E. coli proteins using one- and two-dimensional fractionated samples, and ion mobility separation enabling deep proteome coverage. We demonstrate the utility of this high-quality library with robustness in quantitation of the E. coli proteome and with rapid-chromatography to enhance throughput by targeted DIA-MS. The spectral assay library supports the detection and quantification of 91.5% of all E. coli proteins at high-confidence with 56,182 proteotypic peptides, making it a valuable resource for the scientific community. Data and spectral libraries are available via ProteomeXchange (PXD020761, PXD020785) and SWATHAtlas (SAL00222-28). 
    more » « less
  3. Abstract

    Data-Independent Acquisition (DIA) is a mass spectrometry-based method to reliably identify and reproducibly quantify large fractions of a target proteome. The peptide-centric data analysis strategy employed in DIA requiresa priorigenerated spectral assay libraries. Such assay libraries allow to extract quantitative data in a targeted approach and have been generated for human, mouse, zebrafish,E. coliand few other organisms. However, a spectral assay library for the extreme halophilic archaeonHalobacterium salinarumNRC-1, a model organism that contributed to several notable discoveries, is not publicly available yet. Here, we report a comprehensive spectral assay library to measure 2,563 of 2,646 annotatedH. salinarumNRC-1 proteins. We demonstrate the utility of this library by measuring global protein abundances over time under standard growth conditions. TheH. salinarumNRC-1 library includes 21,074 distinct peptides representing 97% of the predicted proteome and provides a new, valuable resource to confidently measure and quantify any protein of this archaeon. Data and spectral assay libraries are available via ProteomeXchange (PXD042770, PXD042774) and SWATHAtlas (SAL00312-SAL00319).

     
    more » « less
  4. Abstract

    Botryllus schlosseri, is a model marine invertebrate for studying immunity, regeneration, and stress‐induced evolution. Conditions for validating its predicted proteome were optimized using nanoElute® 2 deep‐coverage LCMS, revealing up to 4930 protein groups and 20,984 unique peptides per sample. Spectral libraries were generated and filtered to remove interferences, low‐quality transitions, and only retain proteins with >3 unique peptides. The resulting DIA assay library enabled label‐free quantitation of 3426 protein groups represented by 22,593 unique peptides. Quantitative comparisons of single systems from a laboratory‐raised with two field‐collected populations revealed (1) a more unique proteome in the laboratory‐raised population, and (2) proteins with high/low individual variabilities in each population. DNA repair/replication, ion transport, and intracellular signaling processes were distinct in laboratory‐cultured colonies. Spliceosome and Wnt signaling proteins were the least variable (highly functionally constrained) in all populations. In conclusion, we present the first colonial tunicate's deep quantitative proteome analysis, identifying functional protein clusters associated with laboratory conditions, different habitats, and strong versus relaxed abundance constraints. These results empower research onB. schlosseriwith proteomics resources and enable quantitative molecular phenotyping of changes associated with transfer from in situ to ex situ and from in vivo to in vitro culture conditions.

     
    more » « less
  5. Lankes, R. David (Ed.)
    Resilience is often treated as a single-dimension system attribute, or various dimensions of resilience are studied separately without considering multi-dimensionality. The increasing frequency of catastrophic natural or man-made disasters affecting rural areas demands holistic assessments of community vulnerability and assessment. Disproportionate effects of disasters on minorities, low-income, hard-to-reach, and vulnerable populations demand a community-oriented planning approach to address the “resilience divide.” Rural areas have many advantages, but low population density, coupled with dispersed infrastructures and community support networks, make these areas more affected by natural disasters. This paper will catalyze three key learnings from our current work in public librarians’ roles in disaster resiliency: rural communities are composed of diverse sub-communities, each which experiences and responds to traumatic events differently, depending on micro-geographic and demographic drivers. Rural citizens tend to be very self-reliant and are committed to strengthening and sustaining community resiliency with local human capital and resources. Public libraries are central to rural life, providing a range of informational, educational, social, and personal services, especially in remote areas that lack reliable access to community resources during disasters. Public libraries and their librarian leaders are often a “crown jewel” of rural areas’ community infrastructure and this paper will present a community-based design and assessment process for resiliency hubs located in and operated through rural public libraries. The core technical and social science research questions explored in the proposed paper are: 1) Who were the key beneficiaries and what did they need? 2) What was the process of designing a resiliency hub? 3) What did library resiliency hubs provide and how can they be sustained? This resiliency hub study will detail co-production of solutions and involves an inclusive collaboration among researchers, librarians, and community members to address the effects of cascading impacts of natural disasters. The novel co-design process detailed in the paper reflects an in-depth understanding of the complex interactions among libraries, residents, governments, and other agencies by collecting sociotechnical hurricane-related data for Calhoun County, Florida, USA, a region devastated by Hurricane Michael (2018) and hard-hit by Covid-19. We analyzed data from newly developed fusing algorithms and incorporating multiple communities and developed a framework and process to co-design resiliency hubs sited in public libraries. This research leverages a unique opportunity to library-centered policies and technologies to establish a new paradigm for developing disaster resiliency in rural settings. Public libraries serve a diverse population who will directly benefit from practical support tailored to their needs. The project will inform efficient plans to ensure that high-need groups are not isolated in disasters. The knowledge and insight gained from the resiliency hub design process will not only improve our understanding of emergency response operations, but also will contribute to the development of new disaster related policies and plans for public libraries, with a broader application to rural communities in many settings. 
    more » « less