skip to main content


Title: Diversifying Content Generation for Commonsense Reasoning with Mixture of Knowledge Graph Experts
Generative commonsense reasoning (GCR) in natural language is to reason about the commonsense while generating coherent text. Recent years have seen a surge of interest in improving the generation quality of commonsense reasoning tasks. Nevertheless, these approaches have seldom investigated diversity in the GCR tasks, which aims to generate alternative explanations for a real-world situation or predict all possible outcomes. Diversifying GCR is challenging as it expects to generate multiple outputs that are not only semantically different but also grounded in commonsense knowledge. In this paper, we propose MoKGE, a novel method that diversifies the generative reasoning by a mixture of expert (MoE) strategy on commonsense knowledge graphs (KG). A set of knowledge experts seek diverse reasoning on KG to encourage various generation outputs. Empirical experiments demonstrated that MoKGE can significantly improve the diversity while achieving on par performance on accuracy on two GCR benchmarks, based on both automatic and human evaluations.  more » « less
Award ID(s):
1849816 1901059 2119531 2142827
NSF-PAR ID:
10334384
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Findings of the Association for Computational Linguistics: ACL 2022
Page Range / eLocation ID:
1896 to 1906
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. Although a gap remains between generative and discriminative approaches on zero-shot recognition tasks, our diffusion-based approach has significantly stronger multimodal compositional reasoning ability than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. Our models achieve strong classification performance using only weak augmentations and exhibit qualitatively better "effective robustness" to distribution shift. Overall, our results are a step toward using generative over discriminative models for downstream tasks. 
    more » « less
  2. Understanding narratives requires reasoning about implicit world knowledge related to the causes, effects, and states of situations described in text. At the core of this challenge is how to access contextually relevant knowledge on demand and reason over it. In this paper, we present initial studies toward zero-shot commonsense question answering by formulating the task as inference over dynamically generated commonsense knowledge graphs. In contrast to previous studies for knowledge integration that rely on retrieval of existing knowledge from static knowledge graphs, our study requires commonsense knowledge integration where contextually relevant knowledge is often not present in existing knowledge bases. Therefore, we present a novel approach that generates contextually-relevant symbolic knowledge structures on demand using generative neural commonsense knowledge models. Empirical results on two datasets demonstrate the efficacy of our neuro-symbolic approach for dynamically constructing knowledge graphs for reasoning. Our approach achieves significant performance boosts over pretrained language models and vanilla knowledge models, all while providing interpretable reasoning paths for its predictions. 
    more » « less
  3. Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it. However, pretrained language models (LM), the foundation of most modern QA systems, do not robustly represent latent relationships between concepts, which is necessary for reasoning. While knowledge graphs (KG) are often used to augment LMs with structured representations of world knowledge, it remains an open question how to effectively fuse and reason over the KG representations and the language context, which provides situational constraints and nuances. In this work, we propose GreaseLM, a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations. Information from both modalities propagates to the other, allowing language context representations to be grounded by structured world knowledge, and allowing linguistic nuances (e.g., negation, hedging) in the context to inform the graph representations of knowledge. Our results on three benchmarks in the commonsense reasoning (i.e., CommonsenseQA, OpenbookQA) and medical question answering (i.e., MedQA-USMLE) domains demonstrate that GreaseLM can more reliably answer questions that require reasoning over both situational constraints and structured knowledge, even outperforming models 8x larger. 
    more » « less
  4. Abstract To be responsive to dynamically changing real-world environments, an intelligent agent needs to perform complex sequential decision-making tasks that are often guided by commonsense knowledge. The previous work on this line of research led to the framework called interleaved commonsense reasoning and probabilistic planning (i corpp ), which used P-log for representing commmonsense knowledge and Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs) for planning under uncertainty. A main limitation of i corpp is that its implementation requires non-trivial engineering efforts to bridge the commonsense reasoning and probabilistic planning formalisms. In this paper, we present a unified framework to integrate i corpp ’s reasoning and planning components. In particular, we extend probabilistic action language pBC + to express utility, belief states, and observation as in POMDP models. Inheriting the advantages of action languages, the new action language provides an elaboration tolerant representation of POMDP that reflects commonsense knowledge. The idea led to the design of the system pbcplus2pomdp , which compiles a pBC + action description into a POMDP model that can be directly processed by off-the-shelf POMDP solvers to compute an optimal policy of the pBC + action description. Our experiments show that it retains the advantages of i corpp while avoiding the manual efforts in bridging the commonsense reasoner and the probabilistic planner. 
    more » « less
  5. A hallmark of human intelligence is the ability to understand and influence other minds. Humans engage in inferential social learning (ISL) by using commonsense psychology to learn from others and help others learn. Recent advances in artificial intelligence (AI) are raising new questions about the feasibility of human–machine interactions that support such powerful modes of social learning. Here, we envision what it means to develop socially intelligent machines that can learn, teach, and communicate in ways that are characteristic of ISL. Rather than machines that simply predict human behaviours or recapitulate superficial aspects of human sociality (e.g. smiling, imitating), we should aim to build machines that can learn from human inputs and generate outputs for humans by proactively considering human values, intentions and beliefs. While such machines can inspire next-generation AI systems that learn more effectively from humans (as learners) and even help humans acquire new knowledge (as teachers), achieving these goals will also require scientific studies of its counterpart: how humans reason about machine minds and behaviours. We close by discussing the need for closer collaborations between the AI/ML and cognitive science communities to advance a science of both natural and artificial intelligence. This article is part of a discussion meeting issue ‘Cognitive artificial intelligence’. 
    more » « less