NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis

https://doi.org/10.1056/AIoa2400196

Liang, Weixin; Zhang, Yuhui; Cao, Hancheng; Wang, Binglu; Ding, Daisy Yi; Yang, Xinyu; Vodrahalli, Kailas; He, Siyu; Smith, Daniel Scott; Yin, Yian; et al (July 2024, NEJM AI)

BACKGROUND Expert feedback lays the foundation of rigorous research. However, the rapid growth of scholarly production challenges the conventional scienti c feedback mechanisms. High-quality peer reviews are increasingly dif cult to obtain. METHODS We created an automated pipeline using Generative Pretrained Transformer 4 (GPT-4) to provide comments on scienti c papers. We evaluated the quality of GPT-4’s feedback through two large-scale studies. We rst quantitatively compared GPT-4’s gen- erated feedback with human peer reviewers’ feedback in general scienti c papers from 15 Nature family journals (3096 papers in total) and the International Conference on Learning Representations (ICLR) machine learning conference (1709 papers). To speci - cally assess GPT-4’s performance on biomedical papers, we also analyzed a subset of 425 health sciences papers from the Nature portfolio and a random sample of 666 sub- missions to eLife. Additionally, we conducted a prospective user study with 308 research- ers from 110 institutions in the elds of arti cial intelligence and computational biology to understand how researchers perceive feedback generated by our system on their own papers. RESULTS The overlap in the points raised by GPT-4 and by human reviewers (average overlap of 30.85% for Nature journals and 39.23% for ICLR) is comparable with the over- lap between two human reviewers (average overlap of 28.58% for Nature journals and 35.25% for ICLR). Results on eLife and a subset of health sciences papers as categorized by the Nature portfolio show similar patterns. In our prospective user study, more than half (57.4%) of the users found GPT-4–generated feedback helpful/very helpful, and 82.4% found it more bene cial than feedback from at least some human reviewers. We also identify several limitations of large language model (LLM)–generated feedback. CONCLUSIONS Through both retrospective and prospec- tive evaluation, we nd substantial overlap between LLM and human feedback as well as positive user perceptions regarding the usefulness of LLM feedback. Although human expert review should continue to be the foundation of the scienti c process, LLM feedback could bene t researchers, especially when timely expert feedback is not available and in earlier stages of manuscript preparation. (Funded by the Chan–Zuckerberg Initiative and the Stanford Interdisciplin- ary Graduate Fellowship.)
more » « less
Full Text Available
Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor–immune hubs

https://doi.org/10.1038/s41587-024-02173-8

He, Siyu; Jin, Yinuo; Nazaret, Achille; Shi, Lingting; Chen, Xueer; Rampersaud, Sham; Dhillon, Bahawar S; Valdez, Izabella; Friend, Lauren E; Fan, Joy Linyue; et al (February 2025, Nature Biotechnology)

Abstract Spatially resolved gene expression profiling provides insight into tissue organization and cell–cell crosstalk; however, sequencing-based spatial transcriptomics (ST) lacks single-cell resolution. Current ST analysis methods require single-cell RNA sequencing data as a reference for rigorous interpretation of cell states, mostly do not use associated histology images and are not capable of inferring shared neighborhoods across multiple tissues. Here we present Starfysh, a computational toolbox using a deep generative model that incorporates archetypal analysis and any known cell type markers to characterize known or new tissue-specific cell states without a single-cell reference. Starfysh improves the characterization of spatial dynamics in complex tissues using histology images and enables the comparison of niches as spatial hubs across tissues. Integrative analysis of primary estrogen receptor (ER)-positive breast cancer, triple-negative breast cancer (TNBC) and metaplastic breast cancer (MBC) tissues led to the identification of spatial hubs with patient- and disease-specific cell type compositions and revealed metabolic reprogramming shaping immunosuppressive hubs in aggressive MBC.
more » « less
Free, publicly-accessible full text available February 1, 2026
Detecting galaxy–filament alignments in the Sloan Digital Sky Survey III

https://doi.org/10.1093/mnras/stz539

Chen, Yen-Chi; Ho, Shirley; Blazek, Jonathan; He, Siyu; Mandelbaum, Rachel; Melchior, Peter; Singh, Sukhdeep (February 2019, Monthly Notices of the Royal Astronomical Society)

Full Text Available

Search for: All records