COVIDSeer : Extending the CORD-19 Dataset

Rohatgi, S.; Karishma, Z.; Chhay, J.; Keesara, S.R.R.; Wu, J.; Caragea, C.; Giles, C.L.

doi:10.1145/3395027.3419597

Citation Details

COVIDSeer : Extending the CORD-19 Dataset

We develop an enhanced version of CORD-19 dataset released by the Allen Institute for AI. Tools in the SeerSuite project are used to exploit information in original articles not directly provided in the CORD-19 datasets. We add 728 new abstracts, 70,102 figures and 31,446 tables with captions that are not provided in the current data release. We also built a vertical search engine COVIDSeer based on the new dataset we created. COVIDSeer has a relatively simple architecture with features like keyword filtering, and similar paper recommendation. The goal was to provide a system and dataset that can help scientists better navigate through the literature concerning COVID-19. The enriched dataset can serve as a supplement to the existing dataset. The search engine, which offers keyphrase-enhanced search, will hopefully help biomedical and life science researchers, medical students, and the general public to more effectively explore coronavirus-related literature. The entire data set and the system will be made open source more »

Award ID(s):: 1823288

PAR ID:: 10271898

Author(s) / Creator(s):: Rohatgi, S.; Karishma, Z.; Chhay, J.; Keesara, S.R.R.; Wu, J.; Caragea, C.; Giles, C.L.

Date Published:: 2020-09-01

Journal Name:: Proceedings of the ACM Symposium on Document Engineering 2020

Page Range / eLocation ID:: 1-4

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3395027.3419597

More Like this