Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings

Dasgupta, Shib; Boratko, Michael; Mishra, Siddhartha; Atmakuri, Shriya; Patel, Dhruvesh; Li, Xiang; McCallum, Andrew

doi:10.18653/v1/2022.acl-long.161

Citation Details

Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings

Learning representations of words in a continuous space is perhaps the most fundamental task in NLP, however words interact in ways much richer than vector dot product similarity can provide. Many relationships between words can be expressed set-theoretically, for example, adjective-noun compounds (eg. “red cars”⊆“cars”) and homographs (eg. “tongue”∩“body” should be similar to “mouth”, while “tongue”∩“language” should be similar to “dialect”) have natural set-theoretic interpretations. Box embeddings are a novel region-based representation which provide the capability to perform these set-theoretic operations. In this work, we provide a fuzzy-set interpretation of box embeddings, and learn box representations of words using a set-theoretic training objective. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a quantitative and qualitative analysis exploring the additional unique expressivity provided by Word2Box. more »

Award ID(s):: 2106391

PAR ID:: 10392238

Author(s) / Creator(s):: Dasgupta, Shib; Boratko, Michael; Mishra, Siddhartha; Atmakuri, Shriya; Patel, Dhruvesh; Li, Xiang; McCallum, Andrew

Date Published:: 2022-05-01

Journal Name:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

Volume:: Volume 1: Long Papers

Page Range / eLocation ID:: 2263 to 2276

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2022.acl-long.161

More Like this