skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reproducibility Signals in Science: A preliminary analysis
Reproducibility is an important feature of science; experiments are retested, and analyses are repeated. Trust in the findings increases when consistent results are achieved. Despite the importance of reproducibility, significant work is often involved in these efforts, and some published findings may not be reproducible due to oversights or errors. In this paper, we examine a myriad of features in scholarly articles published in computer science conferences and journals and test how they correlate with reproducibility. We collected data from three different sources that labeled publications as either reproducible or irreproducible and employed statistical significance tests to identify features of those publications that hold clues about reproducibility. We found the readability of the scholarly article and accessibility of the software artifacts through hyperlinks to be strong signals noticeable amongst reproducible scholarly articles.  more » « less
Award ID(s):
2022443
PAR ID:
10483254
Author(s) / Creator(s):
; ;
Editor(s):
Tirthankar Ghosal, Sergi Blanco-Cuaresma
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Journal Name:
The first Workshop on Information Extraction from Scientific Publications
Subject(s) / Keyword(s):
Reproducibility
Format(s):
Medium: X
Location:
https://aclanthology.org/2022.wiesp-1.16/
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In this paper, we explore the crucial role and challenges of computational reproducibility in geosciences, drawing insights from the Climate Informatics Reproducibility Challenge (CICR) in 2023. The competition aimed at (1) identifying common hurdles to reproduce computational climate science; and (2) creating interactive reproducible publications for selected papers of the Environmental Data Science journal. Based on lessons learned from the challenge, we emphasize the significance of open research practices, mentorship, transparency guidelines, as well as the use of technologies such as executable research objects for the reproduction of geoscientific published research. We propose a supportive framework of tools and infrastructure for evaluating reproducibility in geoscientific publications, with a case study for the climate informatics community. While the recommendations focus on future CIRCs, we expect they would be beneficial for wider umbrella of reproducibility initiatives in geosciences. 
    more » « less
  2. Why are some research studies easy to reproduce while others are difficult? Casting doubt on the accuracy of scientific work is not fruitful, especially when an individual researcher cannot reproduce the claims made in the paper. There could be many subjective reasons behind the inability to reproduce a scientific paper. The field of Machine Learning (ML) faces a reproducibility crisis, and surveying a portion of published articles has resulted in a group realization that although sharing code repositories would be appreciable, code bases are not the end all be all for determining the reproducibility of an article. Various parties involved in the publication process have come forward to address the reproducibility crisis and solutions such as badging articles as reproducible, reproducibility checklists at conferences (NeurIPS, ICML, ICLR, etc.), and sharing artifacts on OpenReview come across as promising solutions to the core problem. The breadth of literature on reproducibility focuses on measures required to avoid ir-reproducibility, and there is not much research into the effort behind reproducing these articles. In this paper, we investigate the factors that contribute to the easiness and difficulty of reproducing previously published studies and report on the foundational framework to quantify effort of reproducibility. 
    more » « less
  3. Parkinson’s disease (PD) is a neurological disorder with complicated and disabling motor and non-motor symptoms. The complexity of PD pathology is amplified due to its dependency on patient diaries and the neurologist’s subjective assessment of clinical scales. A significant amount of recent research has explored new cost-effective and subjective assessment methods pertaining to PD symptoms to address this challenge. This article analyzes the application areas and use of mobile and wearable technology in PD research using the PRISMA methodology. Based on the published papers, we identify four significant fields of research: diagnosis, prognosis and monitoring, predicting response to treatment, and rehabilitation. Between January 2008 and December 2021, 31,718 articles were published in four databases: PubMed Central, Science Direct, IEEE Xplore, and MDPI. After removing unrelated articles, duplicate entries, non-English publications, and other articles that did not fulfill the selection criteria, we manually investigated 1559 articles in this review. Most of the articles (45%) were published during a recent four-year stretch (2018–2021), and 19% of the articles were published in 2021 alone. This trend reflects the research community’s growing interest in assessing PD with wearable devices, particularly in the last four years of the period under study. We conclude that there is a substantial and steady growth in the use of mobile technology in the PD contexts. We share our automated script and the detailed results with the public, making the review reproducible for future publications. 
    more » « less
  4. null (Ed.)
    Communication of scientific findings is fundamental to scholarly discourse. In this article, we show that academic review articles, a quintessential form of interpretive scholarly output, perform curatorial work that substantially transforms the research communities they aim to summarize. Using a corpus of millions of journal articles, we analyze the consequences of review articles for the publications they cite, focusing on citation and co-citation as indicators of scholarly attention. Our analysis shows that, on the one hand, papers cited by formal review articles generally experience a dramatic loss in future citations. Typically, the review gets cited instead of the specific articles mentioned in the review. On the other hand, reviews curate, synthesize, and simplify the literature concerning a research topic. Most reviews identify distinct clusters of work and highlight exemplary bridges that integrate the topic as a whole. These bridging works, in addition to the review, become a shorthand characterization of the topic going forward and receive disproportionate attention. In this manner, formal reviews perform creative destruction so as to render increasingly expansive and redundant bodies of knowledge distinct and comprehensible. 
    more » « less
  5. Abstract Human gene research generates new biology insights with translational potential, yet few studies have considered the health of the human gene literature. The accessibility of human genes for targeted research, combined with unreasonable publication pressures and recent developments in scholarly publishing, may have created a market for low-quality or fraudulent human gene research articles, including articles produced by contract cheating organizations known as paper mills. This review summarises the evidence that paper mills contribute to the human gene research literature at scale and outlines why targeted gene research may be particularly vulnerable to systematic research fraud. To raise awareness of targeted gene research from paper mills, we highlight features of problematic manuscripts and publications that can be detected by gene researchers and/or journal staff. As improved awareness and detection could drive the further evolution of paper mill-supported publications, we also propose changes to academic publishing to more effectively deter and correct problematic publications at scale. In summary, the threat of paper mill-supported gene research highlights the need for all researchers to approach the literature with a more critical mindset, and demand publications that are underpinned by plausible research justifications, rigorous experiments and fully transparent reporting. 
    more » « less