skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: Matching Roles from Temporal Data: Why Joe Biden is not only President, but also Commander-in-Chief
We present role matching, a novel, fine-grained integrity constraint on temporal fact data, i.e., (subject, predicate, object, timestamp)-quadruples. A role is a combination of subject and predicate and can be associated with different objects as the real world evolves and the data changes over time. A role matching states that the associated object of two or more roles should always match across time. Once discovered, role matchings can serve as integrity constraints to improve data quality, for instance of structured data in Wikipedia[3]. If violated, role matchings can alert data owners or editors and thus allow them to correct the error. Finding all role matchings is challenging due both to the inherent quadratic complexity of the matching problem and the need to identify true matches based on the possibly short history of the facts observed so far. To address the first challenge, we introduce several blocking methods both for clean and dirty input data. For the second challenge, the matching stage, we show how the entity resolution method Ditto[27] can be adapted to achieve satisfactory performance for the role matching task. We evaluate our method on datasets from Wikipedia infoboxes, showing that our blocking approaches can achieve 95% recall, while maintaining a reduction ratio of more than 99.99%, even in the presence of dirty data. In the matching stage, we achieve a macro F1-score of 89% on our datasets, using automatically generated labels.  more » « less
Award ID(s):
2107050
PAR ID:
10465853
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Management of Data
Volume:
1
Issue:
1
ISSN:
2836-6573
Page Range / eLocation ID:
1 to 26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Meila, Marina ; Zhang, Tong (Ed.)
    Recent works apply Graph Neural Networks (GNNs) to graph matching tasks and show promising results. Considering that model outputs are complex matchings, we devise several techniques to improve the learning of GNNs and obtain a new model, Stochastic Iterative Graph MAtching (SIGMA). Our model predicts a distribution of matchings, instead of a single matching, for a graph pair so the model can explore several probable matchings. We further introduce a novel multi-step matching procedure, which learns how to refine a graph pair’s matching results incrementally. The model also includes dummy nodes so that the model does not have to find matchings for nodes without correspondence. We fit this model to data via scalable stochastic optimization. We conduct extensive experiments across synthetic graph datasets as well as biochemistry and computer vision applications. Across all tasks, our results show that SIGMA can produce significantly improved graph matching results compared to state-of-the-art models. Ablation studies verify that each of our components (stochastic training, iterative matching, and dummy nodes) offers noticeable improvement. 
    more » « less
  2. null (Ed.)
    Recent works apply Graph Neural Networks (GNNs) to graph matching tasks and show promising results. Considering that model outputs are complex matchings, we devise several techniques to improve the learning of GNNs and obtain a new model, Stochastic Iterative Graph MAtching (SIGMA). Our model predicts a distribution of matchings, instead of a single matching, for a graph pair so the model can explore several probable matchings. We further introduce a novel multi-step matching procedure, which learns how to refine a graph pair’s matching results incrementally. The model also includes dummy nodes so that the model does not have to find matchings for nodes without correspondence. We fit this model to data via scalable stochastic optimization. We conduct extensive experiments across synthetic graph datasets as well as biochemistry and computer vision applications. Across all tasks, our results show that SIGMA can produce significantly improved graph matching results compared to state-of-the-art models. Ablation studies verify that each of our components (stochastic training, iterative matching, and dummy nodes) offers noticeable improvement. 
    more » « less
  3. Many important structured prediction problems, including learning to rank items, correspondence-based natural language processing, and multi-object tracking, can be formulated as weighted bipartite matching optimizations. Existing structured prediction approaches have significant drawbacks when applied under the constraints of perfect bipartite matchings. Exponential family probabilistic models, such as the conditional random field (CRF), provide statistical consistency guarantees, but suffer computationally from the need to compute the normalization term of its distribution over matchings, which is a #P-hard matrix permanent computation. In contrast, the structured support vector machine (SSVM) provides computational efficiency, but lacks Fisher consistency, meaning that there are distributions of data for which it cannot learn the optimal matching even under ideal learning conditions (i.e., given the true distribution and selecting from all measurable potential functions). We propose adversarial bipartite matching to avoid both of these limitations. We develop this approach algorithmically, establish its computational efficiency and Fisher consistency properties, and apply it to matching problems that demonstrate its empirical benefits 
    more » « less
  4. null (Ed.)
    Online social networks provide a convenient platform for the spread of rumors, which could lead to serious aftermaths such as economic losses and public panic. The classical rumor blocking problem aims to launch a set of nodes as a positive cascade to compete with misinformation in order to limit the spread of rumors. However, most of the related researches were based on a one-dimensional diffusion model. In reality, there is more than one feature associated with an object. A user’s impression on this object is determined not just by one feature but by her overall evaluation of all features associated with it. Thus, the influence spread of this object can be decomposed into the spread of multiple features. Based on that, we design a multi-feature diffusion model (MF-model) in this paper and formulate a multi-feature rumor blocking (MFRB) problem on a multi-layer network structure according to this model. To solve the MFRB problem, we design a creative sampling method called Multi-Sampling, which can be applied to this multi-layer network structure. Then, we propose a Revised-IMM algorithm and obtain a satisfactory approximate solution to MFRB. Finally, we evaluate our proposed algorithm by conducting experiments on real datasets, which shows the effectiveness of our Revised- IMM and its advantage to their baseline algorithms. 
    more » « less
  5. Singh, M. ; Williamson, D. (Ed.)
    Birkhoff’s representation theorem defines a bijection between elements of a distributive lattice L and the family of upper sets of an associated poset B. When elements of L are the stable matchings in an instance of Gale and Shapley’s marriage model, Irving et al. showed how to use B to devise a combinatorial algorithm for maximizing a linear function over the set of stable matchings. In this paper, we introduce a general property of distributive lattices, which we term as affine representability, and show its role in efficiently solving linear optimization problems over the elements of a distributive lattice, as well as describing the convex hull of the characteristic vectors of lattice elements. We apply this concept to the stable matching model with path-independent quotafilling choice functions, thus giving efficient algorithms and a compact polyhedral description for this model. To the best of our knowledge, this model generalizes all models from the literature for which similar results were known, and our paper is the first that proposes efficient algorithms for stable matchings with choice functions, beyond extension of the Deferred Acceptance algorithm. 
    more » « less