Corpus-Based Relation Extraction by Identifying and Refining Relation Patterns

Sizhe Zhou, Suyu Ge

Citation Details

Automated relation extraction without extensive human-annotated data is a crucial yet challenging task in text mining. Existing studies typically use lexical patterns to label a small set of high-precision relation triples and then employ distributional methods to enhance detection recall. This precision-first approach works well for common relation types but struggles with unconventional and infrequent ones. In this work, we propose a recall-first approach that first leverages high-recall patterns (e.g., a per:siblings relation normally requires both the head and tail entities in the person type) to provide initial candidate relation triples with weak labels and then clusters these candidate relation triples in a latent spherical space to extract high-quality weak supervisions. Specifically, we present a novel framework, RCLUS, where each relation triple is represented by its head/tail entity type and the shortest dependency path between the entity mentions. RCLUS first applies high-recall patterns to narrow down each relation type’s candidate space. Then, it embeds candidate relation triples in a latent space and conducts spherical clustering to further filter out noisy candidates and identify high-quality weakly-labeled triples. Finally, RCLUS leverages the above-obtained triples to prompt-tune a pre-trained language model and utilizes it for improved extraction coverage. We conduct extensive experiments on three public datasets and demonstrate that RCLUS outperforms the weakly-supervised baselines by a large margin and achieves generally better performance than fully-supervised methods in low-resource settings. more »

Award ID(s):: 1956151 1741317 1704532

PAR ID:: 10467075

Author(s) / Creator(s):: Sizhe Zhou, Suyu Ge

Editor(s):: Proc. 2023 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Publisher / Repository:: Springer

Date Published:: 2023-09-18

Edition / Version:: 1

Subject(s) / Keyword(s):: Corpus-Based Relation Extraction, Mining Relation Patterns

Format(s):: Medium: X

Location:: Torino, Italy

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this