skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Forecasting the publication and citation outcomes of COVID-19 preprints
Many publications on COVID-19 were released on preprint servers such as medRxiv and bioRxiv. It is unknown how reliable these preprints are, and which ones will eventually be published in scientific journals. In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric score. Most of these preprints were published within 1 year of upload on a preprint server (70%), with a considerable fraction (45%) appearing in a high-impact journal with a journal impact factor of at least 10. On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after 1 year and if the publishing journal has high impact. Forecasts are also informative with respect to Google Scholar citations within 1 year of upload on a preprint server. For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than traditional peer-review, it remains to be investigated if such an assessment is suited to identify methodological problems in preprints.  more » « less
Award ID(s):
2007951
PAR ID:
10391556
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Royal Society Open Science
Volume:
9
Issue:
9
ISSN:
2054-5703
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Information about individual publications associated with grants funded by NSF to support SES research from 2000-2015 (see "SES grants, 2000-2015"). For grants with ten or fewer publications, we included information about all available publications in this dataset. For grants with more than ten publications, we randomly selected ten to include in this dataset. CSV file with 13 columns and names in header row: "Grant ID" is the ID from the Dimensions platform (string); "Grant Number" is the NSF Award number (integer); "Publication Title" is the title of the paper (text); "Publication Year" is the year in which the paper was published (year); "Authors" is a list or abbreviated list of the authors of the paper (text); "Journal" is the name of the scientific journal or outlet in which the paper is published (text); "Interdis Rubric 1" is a metric representing the dataset authors' assessment for the level of interdisciplinarity represented by the paper (integer: “1” indicated social and natural science interdisciplinarity where both social and environmental conditions are measured or explored and/or author affiliations included departments across these disciplines; “2” indicated general interdisciplinarity between two or more different fields (that may both be within natural or social science); and “3” indicated single-disciplinarity) "Citations" is the count of citations the paper had received as of the date listed in "date for cite count", as reported in Google Scholar (integer); "date for cite count" is the date on which citation count for the paper was obtained (ddBBByy); "Abstract" is the text of the abstract of the paper, where available (text); "Notes" are any notes added by the authors of the dataset (text). 
    more » « less
  2. Academic peer review is fundamental for scientific knowledge dissemination, and various initiatives are exploring how the peer-review process could be more open, efficient and rewarding. We report five case studies where a live community-based review session was integrated into the editorial workflow of an academic journal (Current Research in Neurobiology; CRNEUR). Five manuscripts, submitted as preprints, underwent Live Review—a structured collaborative review session led by PREreview, an open science project advancing openness in scholarly evaluation. With each case, PREreview team members facilitated a 90-minute online discussion where registered participants provided real-time discussion and worked together on an online structured peer-review document. Authors could join as observers or to answer questions, and journal editors could join as observers. Participants then volunteered to write up the session notes into a final review and summary statement. Review participants had the option to sign the review. The finalized review was then published on PREreview’s open preprint review platform approximately two weeks after the Live Review session. The published review was assigned a Digital Object Identifier (DOI) for participating reviewers to obtain credit for their reviewing effort. The published review was then incorporated into CRNEUR’s editorial process to inform editorial decisions. Results suggest that the speed of this community review can be as rapid as the standard peer-review process for CRNEUR during the same time period, and a small sample size survey of the Live Review pilots attendees showed agreement on several questions including the review being respectful, time efficient and scientifically rigorous. We discuss how live, community-based review approaches could be further developed, scaled and sustained. 
    more » « less
  3. Abstract Ecosystems around the globe are experiencing changes in both the magnitude and fluctuations of environmental conditions due to land use and climate change. In response, ecologists are increasingly using near‐term, iterative ecological forecasts to predict how ecosystems will change in the future. To date, many near‐term, iterative forecasting systems have been developed using high temporal frequency (minute to hourly resolution) data streams for assimilation. However, this approach may be cost‐prohibitive or impossible for forecasting ecological variables that lack high‐frequency sensors or have high data latency (i.e., a delay before data are available for modeling after collection). To explore the effects of data assimilation frequency on forecast skill, we developed water temperature forecasts for a eutrophic drinking water reservoir and conducted data assimilation experiments by selectively withholding observations to examine the effect of data availability on forecast accuracy. We used in situ sensors, manually collected data, and a calibrated water quality ecosystem model driven by forecasted weather data to generate future water temperature forecasts using Forecasting Lake and Reservoir Ecosystems (FLARE), an open source water quality forecasting system. We tested the effect of daily, weekly, fortnightly, and monthly data assimilation on the skill of 1‐ to 35‐day‐ahead water temperature forecasts. We found that forecast skill varied depending on the season, forecast horizon, depth, and data assimilation frequency, but overall forecast performance was high, with a mean 1‐day‐ahead forecast root mean square error (RMSE) of 0.81°C, mean 7‐day RMSE of 1.15°C, and mean 35‐day RMSE of 1.94°C. Aggregated across the year, daily data assimilation yielded the most skillful forecasts at 1‐ to 7‐day‐ahead horizons, but weekly data assimilation resulted in the most skillful forecasts at 8‐ to 35‐day‐ahead horizons. Within a year, forecasts with weekly data assimilation consistently outperformed forecasts with daily data assimilation after the 8‐day forecast horizon during mixed spring/autumn periods and 5‐ to 14‐day‐ahead horizons during the summer‐stratified period, depending on depth. Our results suggest that lower frequency data (i.e., weekly) may be adequate for developing accurate forecasts in some applications, further enabling the development of forecasts broadly across ecosystems and ecological variables without high‐frequency sensor data. 
    more » « less
  4. null (Ed.)
    Accurate prediction of scientific impact is important for scientists, academic recommender systems, and granting organizations alike. Existing approaches rely on many years of leading citation values to predict a scientific paper’s citations (a proxy for impact), even though most papers make their largest contributions in the first few years after they are published. In this paper, we tackle a new problem: predicting a new paper’s citation time series from the date of publication (i.e., without leading values). We propose HINTS, a novel end-to-end deep learning framework that converts citation signals from dynamic heterogeneous information networks (DHIN) into citation time series. HINTS imputes pseudo-leading values for a paper in the years before it is published from DHIN embeddings, and then transforms these embeddings into the parameters of a formal model that can predict citation counts immediately after publication. Empirical analysis on two real-world datasets from Computer Science and Physics show that HINTS is competitive with baseline citation prediction models. While we focus on citations, our approach generalizes to other “cold start” time series prediction tasks where relational data is available and accurate prediction in early timestamps is crucial. 
    more » « less
  5. null (Ed.)
    Abstract El Niño and La Niña events show a wide range of durations over the historical record. The predictability of event duration has remained largely unknown, although multiyear events could prolong their climate impacts. To explore the predictability of El Niño and La Niña event duration, multiyear ensemble forecasts are conducted with the Community Earth System Model, version 1 (CESM1). The 10–40-member forecasts are initialized with observed oceanic conditions on 1 March, 1 June, and 1 November of each year during 1954–2015; ensemble spread is created through slight perturbations to the atmospheric initial conditions. The CESM1 predicts the duration of individual El Niño and La Niña events with lead times ranging from 6 to 25 months. In particular, forecasts initialized in November, near the first peak of El Niño or La Niña, can skillfully predict whether the event continues through the second year with 1-yr lead time. The occurrence of multiyear La Niña events can be predicted even earlier with lead times up to 25 months, especially when they are preceded by strong El Niño. The predictability of event duration arises from initial thermocline depth anomalies in the equatorial Pacific, as well as sea surface temperature anomalies within and outside the tropical Pacific. The forecast error growth, on the other hand, originates mainly from atmospheric variability over the North Pacific in boreal winter. The high predictability of event duration indicates the potential for extending 12-month operational forecasts of El Niño and La Niña events by one additional year. 
    more » « less