skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Availability of web servers significantly boosts citations rates of bioinformatics methods for protein function and disorder prediction
Abstract MotivationDevelopment of bioinformatics methods is a long, complex and resource-hungry process. Hundreds of these tools were released. While some methods are highly cited and used, many suffer relatively low citation rates. We empirically analyze a large collection of recently released methods in three diverse protein function and disorder prediction areas to identify key factors that contribute to increased citations. ResultsWe show that provision of a working web server significantly boosts citation rates. On average, methods with working web servers generate three times as many citations compared to tools that are available as only source code, have no code and no server, or are no longer available. This observation holds consistently across different research areas and publication years. We also find that differences in predictive performance are unlikely to impact citation rates. Overall, our empirical results suggest that a relatively low-cost investment into the provision and long-term support of web servers would substantially increase the impact of bioinformatics tools.  more » « less
Award ID(s):
2146027
PAR ID:
10490410
Author(s) / Creator(s):
;
Editor(s):
Arighi, Cecilia
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics Advances
Volume:
3
Issue:
1
ISSN:
2635-0041
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bolboacă, Sorana D (Ed.)
    Background and aimCitations in academia have long been regarded as a fundamental means of acknowledging the contribution of past work and promoting scientific advancement. The aim of this paper was to investigate the impact that misconduct allegations made against scholars have on the citations of their work, comparing allegations of sexual misconduct (unrelatedto the research merit) and allegations of scientific misconduct (directly relatedto the research merit). MethodsWe collected citation data from the Web of Science (WoS) in 2021, encompassing 31,941 publications from 172 accused and control scholars across 18 disciplines. We also conducted two studies: one on non-academics (N = 231) and one on academics (N = 240). ResultsThe WoS data shows that scholars accused of sexual misconduct incur a significant citation decrease in the three years after the accusations become public, while we do not detect a significant citation decrease for scholars accused of scientific misconduct. The study involving non-academics suggests that individuals are more averse to sexual than to scientific misconduct. Finally, contrary to the WoS data findings, a sample of academics indicates they are more likely to cite scholars accused of sexual misconduct than those accused of scientific misconduct. ConclusionsIn the first three years after accusations became public, scholars accused of sexual misconduct incur a larger citation penalty than scholars accused of scientific misconduct. However, when asked to predict their citing behavior, scholars indicated the reverse pattern, suggesting they might mis-predict their behavior or be reluctant to disclose their preferences. 
    more » « less
  2. Abstract Biologists increasingly rely on computer code to collect and analyze their data, reinforcing the importance of published code for transparency, reproducibility, training, and a basis for further work. Here, we conduct a literature review estimating temporal trends in code sharing in ecology and evolution publications since 2010, and test for an influence of code sharing on citation rate. We find that code is rarely published (only 6% of papers), with little improvement over time. We also found there may be incentives to publish code: Publications that share code have tended to be low‐impact initially, but accumulate citations faster, compensating for this deficit. Studies that additionally meet other Open Science criteria, open‐access publication, or data sharing, have still higher citation rates, with publications meeting all three criteria (code sharing, data sharing, and open access publication) tending to have the most citations and highest rate of citation accumulation. 
    more » « less
  3. Abstract Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area. 
    more » « less
  4. Many publications on COVID-19 were released on preprint servers such as medRxiv and bioRxiv. It is unknown how reliable these preprints are, and which ones will eventually be published in scientific journals. In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric score. Most of these preprints were published within 1 year of upload on a preprint server (70%), with a considerable fraction (45%) appearing in a high-impact journal with a journal impact factor of at least 10. On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after 1 year and if the publishing journal has high impact. Forecasts are also informative with respect to Google Scholar citations within 1 year of upload on a preprint server. For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than traditional peer-review, it remains to be investigated if such an assessment is suited to identify methodological problems in preprints. 
    more » « less
  5. Marschall, Tobias (Ed.)
    Abstract MotivationJBrowse Jupyter is a package that aims to close the gap between Python programming and genomic visualization. Web-based genome browsers are routinely used for publishing and inspecting genome annotations. Historically they have been deployed at the end of bioinformatics pipelines, typically decoupled from the analysis itself. However, emerging technologies such as Jupyter notebooks enable a more rapid iterative cycle of development, analysis and visualization. ResultsWe have developed a package that provides a Python interface to JBrowse 2’s suite of embeddable components, including the primary Linear Genome View. The package enables users to quickly set up, launch and customize JBrowse views from Jupyter notebooks. In addition, users can share their data via Google’s Colab notebooks, providing reproducible interactive views. Availability and implementationJBrowse Jupyter is released under the Apache License and is available for download on PyPI. Source code and demos are available on GitHub at https://github.com/GMOD/jbrowse-jupyter. 
    more » « less