Unsupervised Partial Sentence Matching for Cited Text Identification

Ricci, Kathryn; Chang, Haw-Shiuan; Goyal, Purujit; McCallum, Andrew

Citation Details

Given a citation in the body of a research paper, cited text identification aims to find the sentences in the cited paper that are most relevant to the citing sentence. The task is fundamentally one of sentence matching, where affinity is often assessed by a cosine similarity between sentence embeddings. However, (a) sentences may not be well-represented by a single embedding because they contain multiple distinct semantic aspects, and (b) good matches may not require a strong match in all aspects. To overcome these limitations, we propose a simple and efficient unsupervised method for cited text identification that adapts an asymmetric similarity measure to allow partial matches of multiple aspects in both sentences. On the CL-SciSumm dataset we find that our method outperforms a baseline symmetric approach, and, surprisingly, also outperforms all supervised and unsupervised systems submitted to past editions of CL-SciSumm Shared Task 1a. more »

Award ID(s):: 1922090

PAR ID:: 10392130

Author(s) / Creator(s):: Ricci, Kathryn; Chang, Haw-Shiuan; Goyal, Purujit; McCallum, Andrew

Date Published:: 2022-10-01

Journal Name:: ACL Proceedings of the Third Workshop on Scholarly Document Processing

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this