Design Considerations for a Sustainable Scholarly Big Data Service

Wu, Jian; Rohatgi, Shaurya; Angadi, Manoj K.; Puranik, Kavya S.; Giles, C. Lee

doi:10.1145/3574318.3574340

Citation Details

Design Considerations for a Sustainable Scholarly Big Data Service

he advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web- based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and effi- ciency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastruc- ture, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources. more »

Award ID(s):: 1823288

PAR ID:: 10473652

Author(s) / Creator(s):: Wu, Jian; Rohatgi, Shaurya; Angadi, Manoj K.; Puranik, Kavya S.; Giles, C. Lee

Publisher / Repository:: ACM

Date Published:: 2022-12-09

Journal Name:: Forum for Information Retrieval Evaluation. (FIRE 2022)

ISBN:: 9798400700231

Page Range / eLocation ID:: 83 to 87

Format(s):: Medium: X

Location:: Kolkata India

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3574318.3574340

More Like this