COVID-Scraper: An Open-Source Toolset for Automatically Scraping and Processing Global Multi-Scale Spatiotemporal COVID-19 Records

Lan, Hai (ORCID:0000000241194388); Sha, Dexuan (ORCID:0000000161616050); Malarvizhi, Anusha Srirenganathan; Liu, Yi (ORCID:0000000291742004); Li, Yun; Meister, Nadine; Liu, Qian (ORCID:0000000338764877); Wang, Zifu; Yang, Jingchao; Yang, Chaowei Phil (ORCID:0000000177684066)

doi:10.1109/ACCESS.2021.3085682

The COVID Information Commons (CIC) is an open website portal and community to facilitate knowledge-sharing and collaboration across various COVID research efforts, funded by the NSF Convergence Accelerator and the NSF Technology, Innovation and Partnerships Directorate. The CIC serves as an open resource for researchers, students, and decision-makers from academia, government, not-for-profits and industry to identify collaboration opportunities, to leverage each other's research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic. The CIC was developed as a collaborative proposal led by the Northeast Big Data Innovation Hub, hosted by Columbia University, in collaboration with the Midwest Big Data Innovation Hub, South Big Data Innovation Hub, and West Big Data Innovation Hub. It was funded by the NSF Convergence Accelerator (NSF #2028999) in May 2020 and launched in July 2020. The initial focus of the CIC website was on the 723 NSF-funded COVID Rapid Response Research (RAPID) projects funded in 2020. The CIC-E: COVID Information Commons Extension for Pandemic Recovery project was proposed and funded in 2021 (NSF #2139391) by the CIC project team with the goal to increase researcher collaboration across NSF and NIH awardees and with global collaborators, as we continue to combat the novel coronavirus, and glean learnings for future uses of innovations developed for COVID response and recovery, including potential insights which can be leveraged for future pandemics. The CIC extension launched on June 30, 2022 increasing the corpus of awards from just NSF to include NIH-funded COVID related awards, both present and past, through all funding vehicles, in pertinent areas of COVID research, response and recovery. The CIC-extension provides more opportunity for multi-agency and multidisciplinary research collaboration as all the Principal Investigators (PIs) for awards in the CIC are invited to present their research and collaborate on CIC Research Lighting Talk Webinars and Collaboration Sessions. The COVID Information Commons (CIC) archive in Dryad includes the NSF and NIH COVID-related awards in the CIC, as well as links to valuable COVID-related datasets, groups, guides and artifacts including videos and transcripts of the CIC researcher lightning talks. The CIC archive was created in Dryad in 2024 at 84.24 MB and is updated annually. The 2025 update to the CIC archive in Dryad is 91.94 MB. It can be cited as follows: Hudson, Florence et al. (2024). COVID information commons archive [Dataset]. Dryad. https://doi.org/10.5061/dryad.37pvmcvqp

More Like this