Identifying the Central Figure of a Scientific Paper

Yang, Sean T.; Lee, Po-Shen; Kazakova, Lia; Joshi, Abhishek; Oh, Bum Mook; West, Jevin D.; Howe, Bill

doi:10.1109/ICDAR.2019.00173

Citation Details

Identifying the Central Figure of a Scientific Paper

Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a "central figure" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature. more »

Award ID(s):: 1740996 1915774

PAR ID:: 10188257

Author(s) / Creator(s):: Yang, Sean T.; Lee, Po-Shen; Kazakova, Lia; Joshi, Abhishek; Oh, Bum Mook; West, Jevin D.; Howe, Bill

Date Published:: 2019-09-01

Journal Name:: 2019 International Conference on Document Analysis and Recognition (ICDAR)

Volume:: September. 2019

Page Range / eLocation ID:: 1063 to 1070

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICDAR.2019.00173

More Like this