Keyphrase Extraction in Scholarly Digital Library Search Engines

Patel, K.; Caragea, C.; Wu, J; Giles, C.L.

doi:10.1007/978-3-030-59618-7_12

Citation Details

Keyphrase Extraction in Scholarly Digital Library Search Engines

Scholarly digital libraries provide access to scientific publications and comprise useful resources for researchers who search for literature on specific subject areas. CiteSeerX is an example of such a digital library search engine that provides access to more than 10 million academic documents and has nearly one million users and three million hits per day. Artificial Intelligence (AI) technologies are used in many components of CiteSeerX including Web crawling, document ingestion, and metadata extraction. CiteSeerX also uses an unsupervised algorithm called noun phrase chunking (NP-Chunking) to extract keyphrases out of documents. However, often NP-Chunking extracts many unimportant noun phrases. In this paper, we investigate and contrast three supervised keyphrase extraction models to explore their deployment in CiteSeerX for extracting high quality keyphrases. To perform user evaluations on the keyphrases predicted by different models, we integrate a voting interface into CiteSeerX. We show the development and deployment of the keyphrase extraction models and the maintenance requirements. more »

Award ID(s):: 1823288

PAR ID:: 10271903

Author(s) / Creator(s):: Patel, K.; Caragea, C.; Wu, J; Giles, C.L.

Date Published:: 2020-09-01

Journal Name:: International Conference on Web Services

Page Range / eLocation ID:: 179-196

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1007/978-3-030-59618-7_12

More Like this