- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- International Conference on Management of Data (SIGMOD), 2019
- Page Range or eLocation-ID:
- 247 to 262
- Sponsoring Org:
- National Science Foundation
More Like this
Along with textual content, visual features play an essential role in the semantics of visually rich documents. Information extraction (IE) tasks perform poorly on these documents if these visual cues are not taken into account. In this paper, we present Artemis - a visually aware, machine-learning-based IE method for heterogeneous visually rich documents. Artemis represents a visual span in a document by jointly encoding its visual and textual context for IE tasks. Our main contribution is two-fold. First, we develop a deep-learning model that identifies the local context boundary of a visual span with minimal human-labeling. Second, we describe a deep neural network that encodes the multimodal context of a visual span into a fixed-length vector by taking its textual and layout-specific features into account. It identifies the visual span(s) containing a named entity by leveraging this learned representation followed by an inference task. We evaluate Artemis on four heterogeneous datasets from different domains over a suite of information extraction tasks. Results show that it outperforms state-of-the-art text-based methods by up to 17 points in F1-score.
Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting that aims to categorize documents effectively, with a couple of seed documents annotated per category. We recognize that texts collected from the Web are often structure-rich, i.e., accompanied by various metadata. One can easily organize the corpus into a text-rich network, joining raw text documents with document attributes, high-quality phrases, label surface names as nodes, and their associations as edges. Such a network provides a holistic view of the corpus’ heterogeneous data sources and enables a joint optimization for network-based analysis and deep textual model training. We therefore propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases – a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Each module generates pseudo training labels from the unlabeled document set, and both modules mutually enhance each other by co-training using pooled pseudo labels. We test our model on two real-world datasets. On the challenging e-commercemore »
Deterministic Routing between Layout Abstractions for Multi-Scale Classification of Visually Rich Documents
Classifying heterogeneous visually rich documents is a challenging task. Difficulty of this task increases even more if the maximum allowed inference turnaround time is constrained by a threshold. The increased overhead in inference cost, compared to the limited gain in classification capabilities make current multi-scale approaches infeasible in such scenarios. There are two major contributions of this work. First, we propose a spatial pyramid model to extract highly discriminative multi-scale feature descriptors from a visually rich document by leveraging the inherent hierarchy of its layout. Second, we propose a deterministic routing scheme for accelerating end-to-end inference by utilizing the spatial pyramid model. A depth-wise separable multi-column convolutional network is developed to enable our method. We evaluated the proposed approach on four publicly available, benchmark datasets of visually rich documents. Results suggest that our proposed approach demonstrates robust performance compared to the state-of-the-art methods in both classification accuracy and total inference turnaround.
BiTe-GCN: A New GCN Architecture via Bidirectional Convolution of Topology and Features on Text-Rich NetworksLiane Lewin-Eytan, David Carmel (Ed.)Graph convolutional networks (GCNs), aiming to obtain node embeddings by integrating high-order neighborhood information through stacked graph convolution layers, have demonstrated great power in many network analysis tasks such as node classification and link prediction. However, a fundamental weakness of GCNs, that is, topological limitations, including over-smoothing and local homophily of topology, limits their ability to represent networks. Existing studies for solving these topological limitations typically focus only on the convolution of features on network topology, which inevitably relies heavily on network structures. Moreover, most networks are text-rich, so it is important to integrate not only document-level information, but also the local text information which is particularly significant while often ignored by the existing methods. To solve these limitations, we propose BiTe-GCN, a novel GCN architecture modeling via bidirectional convolution of topology and features on text-rich networks. Specifically, we first transform the original text-rich network into an augmented bi-typed heterogeneous network, capturing both the global document-level information and the local text-sequence information from texts.We then introduce discriminative convolution mechanisms, which performs convolution on this augmented bi-typed network, realizing the convolutions of topology and features altogether in the same system, and learning different contributions of these two parts (i.e., network partmore »
We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided. Most existing classifiers leverage textual information in each document. However, in many domains, documents are accompanied by various types of metadata (e.g., authors, venue, and year of a research paper). These metadata and their combinations may serve as strong category indicators in addition to textual contents. In this paper, we explore the potential of using metadata to help weakly supervised text classification. To be specific, we model the relationships between documents and metadata via a heterogeneous information network. To effectively capture higher-order structures in the network, we use motifs to describe metadata combinations. We propose a novel framework, named MotifClass, which (1) selects category-indicative motif instances, (2) retrieves and generates pseudo-labeled training samples based on category names and indicative motif instances, and (3) trains a text classifier using the pseudo training data. Extensive experiments on real-world datasets demonstrate the superior performance of MotifClass to existing weakly supervised text classification approaches. Further analysis shows the benefit of considering higher-order metadata information in our framework.