- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources1
- Resource Type
-
0000000001000000
- More
- Availability
-
01
- Author / Contributor
- Filter by Author / Creator
-
-
Ayub, Umair (1)
-
Baral, Chitta (1)
-
Bitterman, Danielle S (1)
-
Hasan, Bashar (1)
-
He, Huan (1)
-
Kehl, Kenneth L (1)
-
Khakwani, Kaneez_Zahra Rubab (1)
-
Khan, Muhammad Ali (1)
-
Leventakos, Konstantinos (1)
-
Murad, Mohammad Hassan (1)
-
Naqvi, Syed_Arsalan Ahmed (1)
-
Palmer, Jeanne M (1)
-
Raina, Ammad (1)
-
Riaz, Irbaz bin (1)
-
Rumble, Robert Bryan (1)
-
Saeidi, Amir (1)
-
Sipra, Zaryab_bin Riaz (1)
-
Tevaarwerk, Amye J (1)
-
Warner, Jeremy L (1)
-
Zhou, Sihan (1)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract ObjectiveData extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world 2-reviewer process. Materials and MethodsA dataset of 10 trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data. The dataset was split into prompt development (n = 5) and held-out test sets (n = 17). GPT-4-turbo and Claude-3-Opus were used for data extraction. Responses from the 2 LLMs were considered concordant if they were the same for a given variable. The discordant responses from each LLM were provided to the other LLM for cross-critique. Accuracy, ie, the total number of correct responses divided by the total number of responses, was computed to assess performance. ResultsIn the prompt development set, 110 (96%) responses were concordant, achieving an accuracy of 0.99 against the gold standard. In the test set, 342 (87%) responses were concordant. The accuracy of the concordant responses was 0.94. The accuracy of the discordant responses was 0.41 for GPT-4-turbo and 0.50 for Claude-3-Opus. Of the 49 discordant responses, 25 (51%) became concordant after cross-critique, increasing accuracy to 0.76. DiscussionConcordant responses by the LLMs are likely to be accurate. In instances of discordant responses, cross-critique can further increase the accuracy. ConclusionLarge language models, when simulated in a collaborative, 2-reviewer workflow, can extract data with reasonable performance, enabling truly “living” systematic reviews.more » « lessFree, publicly-accessible full text available January 21, 2026
An official website of the United States government
