Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different
views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts’ granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model
concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts’ granularity. Different from concepts, we model
entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG
and a newly-created industrial KG showed the effectiveness of Concept2Box.
more »
« less
A Single Vector Is Not Enough: Taxonomy Expansion via Box Embeddings
Taxonomies, which organize knowledge hierarchically, support
various practical web applications such as product navigation in
online shopping and user profle tagging on social platforms. Given
the continued and rapid emergence of new entities, maintaining a
comprehensive taxonomy in a timely manner through human annotation
is prohibitively expensive. Therefore, expanding a taxonomy
automatically with new entities is essential. Most existing methods
for expanding taxonomies encode entities into vector embeddings
(i.e., single points). However, we argue that vectors are insufcient
to model the “is-a” hierarchy in taxonomy (asymmetrical relation),
because two points can only represent pairwise similarity (symmetrical
relation). To this end, we propose to project taxonomy
entities into boxes (i.e., hyperrectangles). Two boxes can be "contained",
"disjoint" and "intersecting", thus naturally representing an
asymmetrical taxonomic hierarchy. Upon box embeddings, we propose
a novel model BoxTaxo for taxonomy expansion. The core of
BoxTaxo is to learn boxes for entities to capture their child-parent
hierarchies. To achieve this, BoxTaxo optimizes the box embeddings
from a joint view of geometry and probability. BoxTaxo also
ofers an easy and natural way for inference: examine whether the
box of a given new entity is fully enclosed inside the box of a candidate
parent from the existing taxonomy. Extensive experiments on
two benchmarks demonstrate the efectiveness of BoxTaxo compared
to vector based models.
more »
« less
- PAR ID:
- 10464388
- Date Published:
- Journal Name:
- Proceedings of the ACM Web Conference 2023 (WWW’23)
- Page Range / eLocation ID:
- 2467 to 2476
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Taxonomies are of great value to many knowledge-rich applications. As the manual taxonomy curation costs enormous human effects, automatic taxonomy construction is in great demand. However, most existing automatic taxonomy construction methods can only build hypernymy taxonomies wherein each edge is limited to expressing the “is-a” relation. Such a restriction limits their applicability to more diverse real-world tasks where the parent-child may carry different relations. In this paper, we aim to construct a task-guided taxonomy from a domain-specific corpus, and allow users to input a “seed” taxonomy, serving as the task guidance. We propose an expansion-based taxonomy construction framework, namely HiExpan, which automatically generates key term list from the corpus and iteratively grows the seed taxonomy. Specifically, HiExpan views all children under each taxonomy node forming a coherent set and builds the taxonomy by recursively expanding all these sets. Furthermore, HiExpan incorporates a weakly-supervised relation extraction module to extract the initial children of a newly expanded node and adjusts the taxonomy tree by optimizing its global structure. Our experiments on three real datasets from different domains demonstrate the effectiveness of HiExpan for building task-guided taxonomies.more » « less
-
Automatic construction of a taxonomy supports many applications in e-commerce, web search, and question answering. Existing taxonomy expansion or completion methods assume that new concepts have been accurately extracted and their embedding vectors learned from the text corpus. However, one critical and fundamental challenge in fixing the incompleteness of taxonomies is the incompleteness of the extracted concepts, especially for those whose names have multiple words and consequently low frequency in the corpus. To resolve the limitations of extraction-based methods, we propose GenTaxo to enhance taxonomy completion by identifying positions in existing taxonomies that need new concepts and then generating appropriate concept names. Instead of relying on the corpus for concept embeddings, GenTaxo learns the contextual embeddings from their surrounding graph-based and language-based relational information, and leverages the corpus for pre-training a concept name generator. Experimental results demonstrate that GenTaxo improves the completeness of taxonomies over existing methods.more » « less
-
Answering complex logical queries on large-scale incomplete knowledge graphs (KGs) is a fundamental yet challenging task. Recently, a promising approach to this problem has been to embed KG entities as well as the query into a vector space such that entities that answer the query are embedded close to the query. However, prior work models queries as single points in the vector space, which is problematic because a complex query represents a potentially large set of its answer entities, but it is unclear how such a set can be represented as a single point. Furthermore, prior work can only handle queries that use conjunctions (^) and existential quantifiers (9). Handling queries with logical disjunctions (_) remains an open problem. Here we propose QUERY2BOX, an embedding-based framework for reasoning over arbitrary queries with ^, _, and 9 operators in massive and incomplete KGs. Our main insight is that queries can be embedded as boxes (i.e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query. We show that conjunctions can be naturally represented as intersections of boxes and also prove a negative result that handling disjunctions would require embedding with dimension proportional to the number of KG entities. However, we show that by transforming queries into a Disjunctive Normal Form, QUERY2BOX is capable of handling arbitrary logical queries with ^, _, 9 in a scalable manner. We demonstrate the effectiveness of QUERY2BOX on three large KGs and show that QUERY2BOX achieves up to 25% relative improvement over the state of the art.more » « less
-
Most existing methods handle cell instance segmentation problems directly without relying on additional detection boxes. These methods generally fails to separate touching cells due to the lack of global understanding of the objects. In contrast, box-based instance segmentation solves this problem by combining object detection with segmentation. However, existing methods typically utilize anchor box-based detectors, which would lead to inferior instance segmentation performance due to the class imbalance issue. In this paper, we propose a new box-based cell instance segmentation method. In particular, we first detect the five pre-defined points of a cell via keypoints detection. Then we group these points according to a keypoint graph and subsequently extract the bounding box for each cell. Finally, cell segmentation is performed on feature maps within the bounding boxes. We validate our method on two cell datasets with distinct object shapes, and empirically demonstrate the superiority of our method compared to other instance segmentation techniques.more » « less