Benchmarking Scalable Methods for Streaming Cross Document Entity Coreference

Logan IV, Robert L; McCallum, Andrew; Singh, Sameer; Bikel, Dan

doi:10.18653/v1/2021.acl-long.364

Citation Details

Benchmarking Scalable Methods for Streaming Cross Document Entity Coreference

Streaming cross document entity coreference (CDC) systems disambiguate mentions of named entities in a scalable manner via incremental clustering. Unlike other approaches for named entity disambiguation (e.g., entity linking), streaming CDC allows for the disambiguation of entities that are unknown at inference time. Thus, it is well-suited for processing streams of data where new entities are frequently introduced. Despite these benefits, this task is currently difficult to study, as existing approaches are either evaluated on datasets that are no longer available, or omit other crucial details needed to ensure fair comparison. In this work, we address this issue by compiling a large benchmark adapted from existing free datasets, and performing a comprehensive evaluation of a number of novel and existing baseline models. We investigate: how to best encode mentions, which clustering algorithms are most effective for grouping mentions, how models transfer to different domains, and how bounding the number of mentions tracked during inference impacts performance. Our results show that the relative performance of neural and feature-based mention encoders varies across different domains, and in most cases the best performance is achieved using a combination of both approaches. We also find that performance is minimally impacted by limiting the number of tracked mentions. more »

Award ID(s):: 1817183

PAR ID:: 10291544

Author(s) / Creator(s):: Logan IV, Robert L; McCallum, Andrew; Singh, Sameer; Bikel, Dan

Date Published:: 2021-07-01

Journal Name:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Page Range / eLocation ID:: 4717 to 4731

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2021.acl-long.364

More Like this