Scholarly Very Large Data: Challenges for Digital Libraries

Jian Wu, C. Lee

Citation Details

The volume of scholarly data has been growing exponentially over the last 50 years. The total size of the open access documents is estimated to be 35 million by 2022. The total amount of data to be handled, including crawled documents, production repository, metadata, extracted content, and their replications, can be as high as 350TB. Academic digital library search engines face significant challenges in maintaining sustainable services. We discuss these challenges and propose feasible solutions to key modules in the digital library architecture including the document storage, data extraction, database and index. We use CiteSeerX as a case study. more »

Award ID(s):: 1823288

PAR ID:: 10173814

Author(s) / Creator(s):: Jian Wu, C. Lee

Date Published:: 2020-04-01

Journal Name:: Large Scale Networking (LSN) Workshop on Huge Data: A Computing, Networking and Distributed Systems Perspective

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this