Matching Roles from Temporal Data: Why Joe Biden is not only President, but also Commander-in-Chief

Bornemann, Leon; Bleifuß, Tobias; Kalashnikov, Dmitri V.; Nargesian, Fatemeh; Naumann, Felix; Srivastava, Divesh

doi:10.1145/3588919

Citation Details

Matching Roles from Temporal Data: Why Joe Biden is not only President, but also Commander-in-Chief

We present role matching, a novel, fine-grained integrity constraint on temporal fact data, i.e., (subject, predicate, object, timestamp)-quadruples. A role is a combination of subject and predicate and can be associated with different objects as the real world evolves and the data changes over time. A role matching states that the associated object of two or more roles should always match across time. Once discovered, role matchings can serve as integrity constraints to improve data quality, for instance of structured data in Wikipedia[3]. If violated, role matchings can alert data owners or editors and thus allow them to correct the error. Finding all role matchings is challenging due both to the inherent quadratic complexity of the matching problem and the need to identify true matches based on the possibly short history of the facts observed so far. To address the first challenge, we introduce several blocking methods both for clean and dirty input data. For the second challenge, the matching stage, we show how the entity resolution method Ditto[27] can be adapted to achieve satisfactory performance for the role matching task. We evaluate our method on datasets from Wikipedia infoboxes, showing that our blocking approaches can achieve 95% recall, while maintaining a reduction ratio of more than 99.99%, even in the presence of dirty data. In the matching stage, we achieve a macro F1-score of 89% on our datasets, using automatically generated labels. more »

Award ID(s):: 2107050

PAR ID:: 10465853

Author(s) / Creator(s):: Bornemann, Leon; Bleifuß, Tobias; Kalashnikov, Dmitri V.; Nargesian, Fatemeh; Naumann, Felix; Srivastava, Divesh

Date Published:: 2023-05-26

Journal Name:: Proceedings of the ACM on Management of Data

Volume:: 1

Issue:: 1

ISSN:: 2836-6573

Page Range / eLocation ID:: 1 to 26

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1145/3588919

More Like this