One of the fundamental problems in Artificial Intelligence is to perform complex multi-hop logical reasoning over the facts captured by a knowledge graph (KG). This problem is challenging, because KGs can be massive and incomplete. Recent approaches embed KG entities in a low dimensional space and then use these embeddings to find the answer entities. However, it has been an outstanding challenge of how to handle arbitrary first-order logic (FOL) queries as present methods are limited to only a subset of FOL operators. In particular, the negation operator is not supported. An additional limitation of present methods is also that they cannot naturally model uncertainty. Here, we present BETAE, a probabilistic embedding framework for answering arbitrary FOL queries over KGs. BETAE is the first method that can handle a complete set of first-order logical operations: conjunction (∧), disjunction (∨), and negation (¬). A key insight of BETAE is to use probabilistic distributions with bounded support, specifically the Beta distribution, and embed queries/entities as distributions, which as a consequence allows us to also faithfully model uncertainty. Logical operations are performed in the embedding space by neural operators over the probabilistic embeddings. We demonstrate the performance of BETAE on answering arbitrary FOLmore »
Neural-Answering Logical Queries on Knowledge Graphs
Logical queries constitute an important subset of questions posed in knowledge graph question answering systems. Yet, effectively answering logical queries on large knowledge graphs remains a highly challenging problem. Traditional subgraph matching based methods might suffer from the noise and incompleteness of the underlying knowledge graph, often with a prolonged online response time. Recently, an alternative type of method has emerged whose key idea is to embed knowledge graph entities and the query in an embedding space so that the embedding of answer entities is close to that of the query. Compared with subgraph matching based methods, it can better handle the noisy or missing information in knowledge graph, with a faster online response. Promising as it might be, several fundamental limitations still exist, including the linear transformation assumption for modeling relations and the inability to answer complex queries with multiple variable nodes. In this paper, we propose an embedding based method (NewLook) to address these limitations. Our proposed method offers three major advantages. First (Applicability), it supports four types of logical operations and can answer queries with multiple variable nodes. Second (Effectiveness), the proposed NewLook goes beyond the linear transformation assumption, and thus consistently outperforms the existing methods. Third more »
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Page Range or eLocation-ID:
- 1087 to 1097
- Sponsoring Org:
- National Science Foundation
More Like this
Answering complex logical queries on large-scale incomplete knowledge graphs (KGs) is a fundamental yet challenging task. Recently, a promising approach to this problem has been to embed KG entities as well as the query into a vector space such that entities that answer the query are embedded close to the query. However, prior work models queries as single points in the vector space, which is problematic because a complex query represents a potentially large set of its answer entities, but it is unclear how such a set can be represented as a single point. Furthermore, prior work can only handle queries that use conjunctions (^) and existential quantifiers (9). Handling queries with logical disjunctions (_) remains an open problem. Here we propose QUERY2BOX, an embedding-based framework for reasoning over arbitrary queries with ^, _, and 9 operators in massive and incomplete KGs. Our main insight is that queries can be embedded as boxes (i.e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query. We show that conjunctions can be naturally represented as intersections of boxes and also prove a negative result that handling disjunctions would require embedding with dimension proportionalmore »
Answering complex First-Order Logical (FOL) queries on large-scale incomplete knowledge graphs (KGs) is an important yet challenging task. Recent advances embed logical queries and KG entities in the same space and conduct query answering via dense similarity search. However, most logical operators designed in previous studies do not satisfy the axiomatic system of classical logic, limiting their performance. Moreover, these logical operators are parameterized and thus require many complex FOL queries as training data, which are often arduous to collect or even inaccessible in most real-world KGs. We thus present FuzzQE, a fuzzy logic based logical query embedding framework for answering FOL queries over KGs. FuzzQE follows fuzzy logic to define logical operators in a principled and learning-free manner, where only entity and relation embeddings require learning. FuzzQE can further benefit from labeled complex logical queries for training. Extensive experiments on two benchmark datasets demonstrate that FuzzQE provides significantly better performance in answering FOL queries compared to state-of-the-art methods. In addition, FuzzQE trained with only KG link prediction can achieve comparable performance to those trained with extra complex query data.
Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M$+$ nodes; (2) preserving moremore »
We introduce delft, a factoid question answering system which combines the nuance and depth of knowledge graph question answering approaches with the broader coverage of free-text. delft builds a free-text knowledge graph from Wikipedia, with entities as nodes and sentences in which entities co-occur as edges. For each question, delft finds the subgraph linking question entity nodes to candidates using text sentences as edges, creating a dense and high coverage semantic graph. A novel graph neural network reasons over the free-text graph—combining evidence on the nodes via information along edge sentences—to select a final answer. Experiments on three question answering datasets show delft can answer entity-rich questions better than machine reading based models, bert-based answer ranking and memory networks. delft’s advantage comes from both the high coverage of its free-text knowledge graph—more than double that of dbpedia relations—and the novel graph neural network which reasons on the rich but noisy free-text evidence.