Improving information extraction from visually rich documents using visual span representations

Sarkhel, Ritesh; Nandi, Arnab

doi:10.14778/3446095.3446104

Citation Details

Improving information extraction from visually rich documents using visual span representations

Along with textual content, visual features play an essential role in the semantics of visually rich documents. Information extraction (IE) tasks perform poorly on these documents if these visual cues are not taken into account. In this paper, we present Artemis - a visually aware, machine-learning-based IE method for heterogeneous visually rich documents. Artemis represents a visual span in a document by jointly encoding its visual and textual context for IE tasks. Our main contribution is two-fold. First, we develop a deep-learning model that identifies the local context boundary of a visual span with minimal human-labeling. Second, we describe a deep neural network that encodes the multimodal context of a visual span into a fixed-length vector by taking its textual and layout-specific features into account. It identifies the visual span(s) containing a named entity by leveraging this learned representation followed by an inference task. We evaluate Artemis on four heterogeneous datasets from different domains over a suite of information extraction tasks. Results show that it outperforms state-of-the-art text-based methods by up to 17 points in F1-score. more »

Award ID(s):: 1910356

PAR ID:: 10268365

Author(s) / Creator(s):: Sarkhel, Ritesh; Nandi, Arnab

Date Published:: 2021-01-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 14

Issue:: 5

ISSN:: 2150-8097

Page Range / eLocation ID:: 822 to 834

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.14778/3446095.3446104

More Like this