There are huge on-going challenges to timely access of accurate online biomedical content due to exponential growth of unstructured biomedical data. Therefore, semantic annotations are essentially required with the biomedical content in order to improve search engines’ context-aware indexing, search efficiency, and precision of the retrieved results. In this study, we propose a personalized semantic annotation recommendations approach to biomedical content through an expanded socio-technical approach. Our layered architecture generates annotations on the users’ entered text in the first layer. To optimize the yielded annotations, users can seek help from professional experts by posing specific questions to them. The socio-technical system also connects help seekers (users) to help providers (experts) employing the pre-trained BERT embedding, which matches the profile similarity scores of users and experts at various levels and suggests a run-time compatible match (of the help seeker and the help provider). Our approach overcomes previous systems’ limitations as they are predominantly non-collaborative and laborious. While performing experiments, we analyzed the performance enhancements offered by our socio-technical approach in improving the semantic annotations in three scenarios in various contexts. Our results show overall achievement of 89.98% precision, 89.61% recall, and an 89.45% f1-score at the system level. Comparatively speaking, a high accuracy of 90% was achieved with the socio-technical approach whereas the traditional approach could only reach 87% accuracy. Our novel socio-technical approach produces apt annotation recommendations that would definitely be helpful for various secondary uses ranging from context-aware indexing to retrieval accuracy improvements.
more »
« less
Proficient Annotation Recommendation in a Biomedical Content Authoring Environment
Given the ubiquity of unstructured biomedical data, significant obstacles still remain in achieving accurate and fast access to online biomedical content. Accompanying semantic annotations with a growing volume biomedical content on the internet is critical to enhancing search engines’ context-aware indexing, improving search speed and retrieval accuracy. We propose a novel methodology for annotation recommendation in the biomedical content authoring environment by introducing the socio-technical approach where users can get recommendations from each other for accurate and high quality semantic annotations. We performed experiments to record the system level performance with and without socio-technical features in three scenarios of different context to evaluate the proposed socio-technical approach. At a system level, we achieved 89.98% precision, 89.61% recall, and an 89.45% F1-score for semantic annotation recollection. Similarly, a high accuracy of 90% is achieved with the socio-technical approach compared to without, which obtains 73% accuracy. However almost equable precision, recall, and F1- score of 90% is gained by scenario-1 and scenario-2, whereas scenario-3 achieved relatively less precision, recall and F1-score of 88%. We conclude that our proposed socio-technical approach produces proficient annotation recommendations that could be helpful for various uses ranging from context-aware indexing to retrieval accuracy.
more »
« less
- Award ID(s):
- 2101350
- PAR ID:
- 10467325
- Editor(s):
- Villazón-Terrazas, B.
- Publisher / Repository:
- In book: Knowledge Graphs and Semantic Web, 4th Iberoamerican Conference and third Indo-American Conference, KGSWC 2022, Madrid, Spain, November 21–23, 2022, Proceedings
- Date Published:
- Format(s):
- Medium: X
- Location:
- https://link.springer.com/chapter/10.1007/978-3-031-21422-6_11
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
An abundance of biomedical data is generated in the form of clinical notes, reports, and research articles available online. This data holds valuable information that requires extraction, retrieval, and transformation into actionable knowledge. However, this information has various access challenges due to the need for precise machine-interpretable semantic metadata required by search engines. Despite search engines' efforts to interpret the semantics information, they still struggle to index, search, and retrieve relevant information accurately. To address these challenges, we propose a novel graph-based semantic knowledge-sharing approach to enhance the quality of biomedical semantic annotation by engaging biomedical domain experts. In this approach, entities in the knowledge-sharing environment are interlinked and play critical roles. Authorial queries can be posted on the "Knowledge Cafe," and community experts can provide recommendations for semantic annotations. The community can further validate and evaluate the expert responses through a voting scheme resulting in a transformed "Knowledge Cafe" environment that functions as a knowledge graph with semantically linked entities. We evaluated the proposed approach through a series of scenarios, resulting in precision, recall, F1-score, and accuracy assessment matrices. Our results showed an acceptable level of accuracy at approximately 90%. The source code for "Semantically" is freely available at: https://github.com/bukharilab/Semanticallymore » « less
-
An abundance of biomedical data is generated in the form of clinical notes, reports, and research articles available online. This data holds valuable information that requires extraction, retrieval, and transformation into actionable knowledge. However, this information has various access challenges due to the need for precise machine-interpretable semantic metadata required by search engines. Despite search engines' efforts to interpret the semantics information, they still struggle to index, search, and retrieve relevant information accurately. To address these challenges, we propose a novel graph-based semantic knowledge-sharing approach to enhance the quality of biomedical semantic annotation by engaging biomedical domain experts. In this approach, entities in the knowledge-sharing environment are interlinked and play critical roles. Authorial queries can be posted on the "Knowledge Cafe," and community experts can provide recommendations for semantic annotations. The community can further validate and evaluate the expert responses through a voting scheme resulting in a transformed "Knowledge Cafe" environment that functions as a knowledge graph with semantically linked entities. We evaluated the proposed approach through a series of scenarios, resulting in precision, recall, F1-score, and accuracy assessment matrices. Our results showed an acceptable level of accuracy at approximately 90%. The source code for "Semantically" is freely available at: https://github.com/bukharilab/Semanticallymore » « less
-
Villazón-Terrazas, B. (Ed.)Each day a vast amount of unstructured content is generated in the biomedical domain from various sources such as clinical notes, research articles and medical reports. Such content contain a sufficient amount of efficient and meaningful information that needs to be converted into actionable knowledge for secondary use. However, accessing precise biomedical content is quite challenging because of content heterogeneity, missing and imprecise metadata and unavailability of associated semantic tags required for search engine optimization. We have introduced a socio-technical semantic annotation optimization approach that enhance the semantic search of biomedical contents. The proposed approach consist of layered architecture. At First layer (Preliminary Semantic Enrichment), it annotates the biomedical contents with the ontological concepts from NCBO BioPortal. With the growing biomedical information, the suggested semantic annotations from NCBO Bioportal are not always correct. Therefore, in the second layer (Optimizing the Enriched Semantic Information), we introduce a knowledge sharing scheme through which authors/users could request for recommendations from other users to optimize the semantic enrichment process. To guage the credibility of the the human recommended, our systems records the recommender confidence score, collects community voting against previous recommendations, stores percentage of correctly suggested annotation and translates that into an index to later connect right users to get suggestions to optimize the semantic enrichment of biomedical contents. At the preliminary layer of annotation from NCBO, we analyzed the n-gram strategy for biomedical word boundary identification. We have found that NCBO recognizes biomedical terms for n-gram-1 more than for n-gram-2 to n-gram-5. Similarly, a statistical measure conducted on significant features using the Wilson score and data normalization. In contrast, the proposed methodology achieves an suitable accuracy of ≈90% for the semantic optimization approach.more » « less
-
This work introduces TrialSieve, a novel framework for biomedical information extraction that enhances clinical meta-analysis and drug repurposing. By extending traditional PICO (Patient, Intervention, Comparison, Outcome) methodologies, TrialSieve incorporates hierarchical, treatment group-based graphs, enabling more comprehensive and quantitative comparisons of clinical outcomes. TrialSieve was used to annotate 1609 PubMed abstracts, 170,557 annotations, and 52,638 final spans, incorporating 20 unique annotation categories that capture a diverse range of biomedical entities relevant to systematic reviews and meta-analyses. The performance (accuracy, precision, recall, F1-score) of four natural-language processing (NLP) models (BioLinkBERT, BioBERT, KRISSBERT, PubMedBERT) and the large language model (LLM), GPT-4o, was evaluated using the human-annotated TrialSieve dataset. BioLinkBERT had the best accuracy (0.875) and recall (0.679) for biomedical entity labeling, whereas PubMedBERT had the best precision (0.614) and F1-score (0.639). Error analysis showed that NLP models trained on noisy, human-annotated data can match or, in most cases, surpass human performance. This finding highlights the feasibility of fully automating biomedical information extraction, even when relying on imperfectly annotated datasets. An annotator user study (n = 39) revealed significant (p < 0.05) gains in efficiency and human annotation accuracy with the unique TrialSieve tree-based annotation approach. In summary, TrialSieve provides a foundation to improve automated biomedical information extraction for frontend clinical research.more » « less
An official website of the United States government
