TrialSieve: A Comprehensive Biomedical Information Extraction Framework for PICO, Meta-Analysis, and Drug Repurposing

Kartchner, David; Turner, Haydn; Ye, Christophe; Al-Hussaini, Irfan; Nursal, Batuhan; Lee, Albert_J B; Deng, Jennifer; Curtis, Courtney; Cho, Hannah; Duvaris, Eva L; Jackson, Coral; Shanks, Catherine E; Tan, Sarah Y; Ramalingam, Selvi; Mitchell, Cassie S

doi:10.3390/bioengineering12050486

Citation Details

This content will become publicly available on May 1, 2026

TrialSieve: A Comprehensive Biomedical Information Extraction Framework for PICO, Meta-Analysis, and Drug Repurposing

This work introduces TrialSieve, a novel framework for biomedical information extraction that enhances clinical meta-analysis and drug repurposing. By extending traditional PICO (Patient, Intervention, Comparison, Outcome) methodologies, TrialSieve incorporates hierarchical, treatment group-based graphs, enabling more comprehensive and quantitative comparisons of clinical outcomes. TrialSieve was used to annotate 1609 PubMed abstracts, 170,557 annotations, and 52,638 final spans, incorporating 20 unique annotation categories that capture a diverse range of biomedical entities relevant to systematic reviews and meta-analyses. The performance (accuracy, precision, recall, F1-score) of four natural-language processing (NLP) models (BioLinkBERT, BioBERT, KRISSBERT, PubMedBERT) and the large language model (LLM), GPT-4o, was evaluated using the human-annotated TrialSieve dataset. BioLinkBERT had the best accuracy (0.875) and recall (0.679) for biomedical entity labeling, whereas PubMedBERT had the best precision (0.614) and F1-score (0.639). Error analysis showed that NLP models trained on noisy, human-annotated data can match or, in most cases, surpass human performance. This finding highlights the feasibility of fully automating biomedical information extraction, even when relying on imperfectly annotated datasets. An annotator user study (n = 39) revealed significant (p < 0.05) gains in efficiency and human annotation accuracy with the unique TrialSieve tree-based annotation approach. In summary, TrialSieve provides a foundation to improve automated biomedical information extraction for frontend clinical research. more »

Award ID(s):: 1944247

PAR ID:: 10615484

Author(s) / Creator(s):: Kartchner, David; Turner, Haydn; Ye, Christophe; Al-Hussaini, Irfan; Nursal, Batuhan; Lee, Albert_J B; Deng, Jennifer; Curtis, Courtney; Cho, Hannah; Duvaris, Eva L; Jackson, Coral; Shanks, Catherine E; Tan, Sarah Y; Ramalingam, Selvi; Mitchell, Cassie S

Publisher / Repository:: MDPI

Date Published:: 2025-05-01

Journal Name:: Bioengineering

Volume:: 12

Issue:: 5

ISSN:: 2306-5354

Page Range / eLocation ID:: 486

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on May 1, 2026
Journal Article:
https://doi.org/10.3390/bioengineering12050486

More Like this