CiteSeerX: 20 years of service to scholarly big data

Wu, Jian; Kim, Kunho; Giles, C. Lee

doi:10.1145/3359115.3359119

Citation Details

CiteSeerX: 20 years of service to scholarly big data

We overview CiteSeerX, the pioneer digital library search engine, that has been serving academic communities for more than 20 years (first released in 1998), from three perspectives. The system perspective summarizes its architecture evolution in three phases over the past 20 years. The data perspective describes how CiteSeerX has created searchable scholarly big datasets and made them freely available for multiple purposes. In order to be scalable and effective, AI technologies are employed in all essential modules. To effectively train these models, a sufficient amount of data has been labeled, which can then be reused for training future models. Finally, we discuss the future of CiteSeerX. Our ongoing work is to make Cite- SeerX more sustainable. To this end, we are working to ingest all open access scholarly papers, estimated to be 30-40 million. Part of the plan is to discover dataset mentions and metadata in scholarly articles and make them more accessible via search interfaces. Users will have more opportunities to explore and trace datasets that can be reused and discover other datasets for new research projects. We summarize what was learned to make a similar system more sustainable and useful. more »

Award ID(s):: 1823288

PAR ID:: 10173327

Author(s) / Creator(s):: Wu, Jian; Kim, Kunho; Giles, C. Lee

Date Published:: 2019-10-01

Journal Name:: Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse, AIDR 2019

Page Range / eLocation ID:: 1:1-1:4

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3359115.3359119

More Like this