<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Extending Answer Set Programs with Neural Networks</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>09/19/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10295441</idno>
					<idno type="doi">10.4204/EPTCS.325.41</idno>
					<title level='j'>Electronic proceedings in theoretical computer science</title>
<idno>2075-2180</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Zhun Yang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The integration of low-level perception with high-level reasoning is one of the oldest problems in Artificial Intelligence. Recently, several proposals were made to implement the reasoning process in complex neural network architectures. While these works aim at extending neural networks with the capability of reasoning, a natural question that we consider is: can we extend answer set programs with neural networks to allow complex and high-level reasoning on neural network outputs? As a preliminary result, we propose NeurASP – a simple extension of answer set programs by embracing neural networks where neural network outputs are treated as probability distributions over atomic facts in answer set programs. We show that NeurASP can not only improve the perception accuracy of a pre-trained neural network, but also help to train a neural network better by giving restrictions through logic rules. However, training with NeurASP would take much more time than pure neural network training due to the internal use of a symbolic reasoning engine. For future work, we plan to investigate the potential ways to solve the scalability issue of NeurASP. One potential way is to embed logic programs directly in neural networks. On this route, we plan to first design a SAT solver using neural networks, then extend such a solver to allow logic programs.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction and Problem Description</head><p>The integration of low-level perception with high-level reasoning is one of the oldest problems in Artificial Intelligence. This topic is revisited with the recent rise of deep neural networks. Several proposals were made to implement the reasoning process in complex neural network architectures, e.g., <ref type="bibr">[5,</ref><ref type="bibr">22,</ref><ref type="bibr">6,</ref><ref type="bibr">8,</ref><ref type="bibr">24,</ref><ref type="bibr">19,</ref><ref type="bibr">16]</ref>. However, it is still not clear how complex and high-level reasoning, such as default reasoning <ref type="bibr">[21]</ref>, ontology reasoning <ref type="bibr">[2]</ref>, and causal reasoning <ref type="bibr">[20]</ref>, can be successfully computed by these approaches. The latter subject has been well-studied in the area of knowledge representation (KR), but many KR formalisms, including answer set programming (ASP) <ref type="bibr">[15,</ref><ref type="bibr">3]</ref>, are logic-oriented and do not incorporate high-dimensional vector space and pre-trained models for perception tasks as handled in deep learning, which limits the applicability of KR in many practical applications involving data and uncertainty. A natural research question we consider is: can we extend answer set programs with neural networks to allow complex and high-level reasoning on information provided in vector space?</p><p>Our research tries to answer this question. To start with, we extended LP MLN , a probabilistic extension of ASP, with neural networks by turning neural network outputs into weighted rules in LP MLN . However, there is a technical challenge: the existing parameter learning method of LP MLN is too slow to be coupled with typical neural network training. This motivates us to consider a simpler probabilistic extension of ASP, for which we design and implement an efficient parameter learning method.</p><p>In this research summary, we present our preliminary work -NeurASP, which is a simple and effective way to integrate sub-symbolic and symbolic computation under stable model semantics while the neural network outputs are treated as the probability distribution over atomic facts in answer set programs. We demonstrate how NeurASP can be useful for some tasks where both perception and reasoning are required, and show that NeurASP can not only improve the perception accuracy of a pre-NeurASP trained neural network, but also help to train a neural network better by giving restrictions through ASP rules.</p><p>The biggest issue we are encountering now is that training with NeurASP implementation still takes much more time than pure neural network training due to the internal use of a symbolic reasoning engine (i.e., CLINGO in our case). We plan to investigate two directions to resolve this issue. One is to compute NeurASP in a circuit. For example, we can turn an NeurASP program into a Probabilistic Sentential Decision Diagram (PSDD) <ref type="bibr">[10]</ref> so that the probability and gradient computation would take linear time in every iteration. The challenge for this route is how to construct a circuit efficiently. The other direction is to embed the logic reasoning part completely in neural networks without referring to any symbolic reasoning engines. The challenge is how to embed logic rules in neural networks while maintaining the expressivity.</p><p>Besides, we plan to apply NeurASP to domains that require both perception and reasoning. The first domain we are going to investigate is visual question-answering and we limit our attention to the problems whose reasoning can be potentially represented in ASP. Some well-known datasets in this domain include NLVR2 <ref type="bibr">[25]</ref>, CLEVR <ref type="bibr">[7]</ref>, and CLEVRER <ref type="bibr">[28]</ref>. We plan to start with replacing the reasoning part in existing works with NeurASP and analyze the pros and cons of applying NeurASP on those visual reasoning domains. We also plan to apply NeurASP to predict whether a vulnerability will be exploited so that the knowledge from domain experts can help neural network training.</p><p>The paper will give a summary of my research, including some background knowledge and reviews of existing literature (Section 2), goal of my research (Section 3), the current status of my research (Section 4), the preliminary results we accomplished (Section 5), and some open issues and expected achievements (Section 6). The implementation of our prelimilary work -NeurASP, as well as codes used for the experiments, is publicly available online at <ref type="url">https://github.com/azreasoners/NeurASP</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background and Overview of the Existing Literature</head><p>Recent years have observed the rising interests of combining perception and reasoning.</p><p>DeepProbLog <ref type="bibr">[17]</ref> extends ProbLog with neural networks by means of neural predicates. We follow similar idea to design NeurASP to extend answer set programs with neural networks. Some differences are: (i) The computation of DeepProbLog relies on constructing an SDD whereas we use an ASP solver internally. (ii) NeurASP employs expressive reasoning originating from answer set programming, such as defaults, aggregates, and optimization rules. This not only gives more expressive reasoning but also allows the more semantic-rich constructs as guide to learning. (iii) DeepProbLog requires each training data to be a single atom, while NeurASP allows each training data to be arbitrary propositional formulas.</p><p>Xu et al. <ref type="bibr">[26]</ref> used the semantic constraints to train neural networks better, but the constraints used in that work are simple propositional formulas whereas we are interested in answer set programming language, in which it is more convenient to encode complex KR constraints. Logic Tensor Network <ref type="bibr">[6]</ref> is also related in that it uses neural networks to provide fuzzy values to atoms.</p><p>Another approach is to embed logic rules in neural networks by representing logical connectives by mathematical operations and allowing the value of an atom to be a real number. For example, Neural Theorem Prover (NTP) <ref type="bibr">[22]</ref> adopts the idea of dynamic neural module networks <ref type="bibr">[1]</ref> to embed logic conjunction and disjunction in and/or-module networks. A proof-tree like end-to-end differentiable neural network is then constructed using Prolog's backward chaining algorithm with these modules. Another method that also constructs a proof-tree like neural network is TensorLog <ref type="bibr">[5]</ref>, which uses matrix multiplication to simulate belief propagation that is tractable under the restriction that each rule is negation-free and can be transformed into a polytree.</p><p>Graph neural network (GNN) <ref type="bibr">[9]</ref> is a neural network model that is gaining more attention recently. Since a graph can encode objects and relations between objects, by learning message functions between the nodes, one can perform certain relational reasoning over the objects. For example, in <ref type="bibr">[19]</ref>, it is shown that GNN can do well on Sudoku, but the input there is not an image but a textual representation. Another work NeuroSAT <ref type="bibr">[23]</ref> shows that GNN can solve general small SAT problems after only being trained as a classifier to predict satisfiability. However, this is still restrictive compared to the more complex reasoning that KR formalisms provide.</p><p>Neuro-Symbolic Concept Learner <ref type="bibr">[18]</ref> separates between visual perception and symbolic reasoning. It shows the data-efficiency by using only 10% of the training data and achieving the state-of-the-art 98% accuracy on CLEVR dataset. Our preliminary result NeurASP is similar in the sense that using symbolic reasoning, we could use fewer data to achieve a high accuracy.</p><p>LP MLN <ref type="bibr">[11]</ref> serves as the foundation of our research. We started with extending LP MLN with neural networks since existing LP MLN parameter learning is already by the gradient descent method <ref type="bibr">[12]</ref> as used in the neural network training. However, there is a technical challenge: the previous parameter learning method does not scale up to be coupled with typical neural network training. This motivates us to consider a fragment of LP MLN first, for which we design and implement an efficient parameter learning method. It turns out that this fragment is general enough to cover the ProbLog language as well as enjoys the expressiveness of full answer set programming language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Goal of the Research</head><p>The goal of our research is to develop some methods to integrate low-level perception with high-level reasoning so that, in inference tasks, reasoning can help identify perception mistakes that violate semantic constraints while, in learning tasks, a neural network not only learns from implicit correlations from the data but also from the explicit complex semantic constraints expressed by the rules.</p><p>To achieve this goal, we investigate the integration of answer set programs with neural networks. We require that such integration should be not only expressive in representation but also efficient in computation. Besides, such integration should be able to apply reasoning in both inference and learning. For example, such integration should be able to reason about relations among perceived objects in Figure <ref type="figure">1</ref> and tell that: the cars in (i 1 ) are toy cars since they are smaller than a person, while cars in (i 2 ) are real cars (by default) since there is no evidence to show they are smaller than those persons. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Current Status of the Research</head><p>To achieve our goal, we investigate and answer the following three research questions. This research is at a middle phase and the questions below are partially answered.</p><p>1. How do we design a formalism that allows complex and high-level reasoning on information provided in vector space?</p><p>We have one such design, NeurASP <ref type="bibr">[27]</ref>, that extends answer set programs with neural networks, using ideas from DeepProbLog <ref type="bibr">[17]</ref> to interface neural networks and ASP (i.e., treating the neural network output as the probability distribution over atomic facts in answer set programs), and using ideas from LP MLN <ref type="bibr">[12]</ref> to design the probability and gradient computation under stable model semantics.</p><p>We plan to design a new formalism that is less expressive but more scalable compared to NeurASP by doing all the reasoning within neural networks. To start with, we followed the ideas in <ref type="bibr">[26]</ref> to define a semantic loss using the logic rules. We represented implication rules by neural network regularizers for small domains including Nqueens problem, Sudoku, and Einstein's puzzle. We plan to design a general way to represent a CNF by a regularizer. We also plan to investigate into the rule forms that can be represented by a regularizer and then use these rule forms to design the new formalism.</p><p>2. How do we implement such a formalism to make it as scalable as possible?</p><p>The current implementation of NeurASP uses CLINGO as its internal reasoning engine and uses PYTORCH to back-propagate the gradients from logic layer to neural networks, which yields the most scalable prototype among all of our trials so far and preserves the expressivity of ASP. Initially, we implemented the prototype of NeurASP where LP MLN <ref type="bibr">[12]</ref> is used to compute the probability and gradient in NeurASP since LP MLN is a probabilistic extension of ASP with well-defined probability and gradient computation under stable model semantics. However, the parameter learning method in LP MLN does not scale up to be coupled with typical neural network training -even for the MNIST addition example proposed in <ref type="bibr">[17]</ref>. To resolve this issue, we tried two approaches. First, we tested the idea of turning an answer set program into a Sentential Decision Diagram (SDD) but found that constructing such an SDD through the route "ASP to CNF to SDD" would take exponential time w.r.t. the number of atoms. There needs more research on SDD and possibly its variation that is more suitable for encoding answer set programs. Second, we tried to simplify LP MLN to its fragment that is simple enough to have efficient computation and also expressive enough to capture all the ASP constructs. This leads to our last version of NeurASP to the date. We are working on the prototype of the new formalism where logic rules are encoded in neural networks. We embedded implication rules in neural networks and successfully solved Nqueens problem, Sudoku, and Einstein's puzzle using neural networks only. We plan to embed CNFs in neural networks and implement a SAT solver using neural network only. Ultimately, we plan to embed answer set programs directly in neural networks. One challenge in this route is how to embed different constructs, such as negation, choice rules, and aggregation, in a neural network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">How do we evaluate and improve such a formalism?</head><p>We evaluated NeurASP in <ref type="bibr">[27]</ref> w.r.t. the following domains: common-sense reasoning about image as in Figure <ref type="figure">1</ref>, MNIST digit addition in <ref type="bibr">[17]</ref>, solving Sudoku in images in <ref type="bibr">[19]</ref>, and the shortest path problem in <ref type="bibr">[26]</ref>. We showed that NeurASP is very expressive and is able to help both neural network inference and training. We also showed that training with NeurASP still takes much more time than pure neural network training due to the internal use of a symbolic reasoning engine (i.e.,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CLINGO).</head><p>We plan to analyze the effect of NeurASP on other domains that require both perception and reasoning. The first domain in our agenda is visual question-answering while we limit our attention to those problems whose reasoning can be potentially represented in ASP. Some well-known datasets in this domain include NLVR2 <ref type="bibr">[25]</ref>, CLEVR <ref type="bibr">[7]</ref>, and CLEVRER <ref type="bibr">[28]</ref>. We plan to start with replacing the reasoning part in the existing work with NeurASP and analyze the pros and cons of applying NeurASP on those visual reasoning domains. We also plan to apply NeurASP to predict whether a vulnerability will be exploited so that the knowledge from domain experts can help neural network training. What's more, since our preliminary results show that a neural network is not always trained better with more constraints, to know how to add constraints in real world problems, we also plan to analyze the effects of different constraints systematically.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Preliminary Results Accomplished</head><p>We designed the syntax and the semantics of NeurASP, and show how NeurASP can be useful for some tasks where both perception and high-level reasoning provided by answer set programs are required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">NeurASP</head><p>In <ref type="bibr">[27]</ref>, we present a simple extension of answer set programs by embracing neural networks. Following the idea of DeepProbLog <ref type="bibr">[17]</ref>, by treating the neural network output as the probability distribution over atomic facts in answer set programs, the proposed NeurASP provides a simple and effective way to integrate sub-symbolic and symbolic computation. In NeurASP, a neural network M is represented by a neural atom of the form</p><p>where (i) nn is a reserved keyword to denote a neural atom; (ii) m is an identifier (symbolic name) of the neural network M; (iii) t is a list of terms that serves as a "pointer" to an input tensor; related to it, there is a mapping D (implemented by an external Python code) that turns t into an input tensor; (iv) v 1 , . . . , v n represent all n possible outcomes of each of the e random events. Each neural atom (1) introduces propositional atoms of the form c = v, where c &#8712; {m 1 (t), . . . , m e (t)} and v &#8712; {v 1 , . . . , v n }. The output of the neural network provides the probabilities of the introduced atoms.</p><p>Example 1 Let M digit be a neural network that classifies an MNIST digit image. The input of M digit is (a tensor representation of) an image and the output is a matrix in R 1&#215;10 . The neural network can be represented by the neural atom nn(digit(1, d), [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),</p><p>A NeurASP program &#928; is the union of &#928; asp and &#928; nn , where &#928; asp is a set of propositional rules and &#928; nn is a set of neural atoms. Let &#963; nn be the set of all atoms m i (t) = v j that is obtained from the neural atoms in &#928; nn as described above. We require that, in each rule Head &#8592; Body in &#928; asp , no atoms in &#963; nn appear in Head.</p><p>The semantics of NeurASP defines a stable model and its associated probability originating from the neural network output. For any NeurASP program &#928;, we first obtain its ASP counterpart &#928; where each neural atom (1) is replaced with the set of rules</p><p>We define the stable models of &#928; as the stable models of &#928; . Example 1 Continued The ASP counter-part of the neural atom in Example 1 is the following rule.</p><p>To define the probability of a stable model, we first define the probability of an atom m i (t) = v j in &#963; nn . Recall that there is an external mapping D that turns t into a specific input tensor D(t) of M. The probability of each atom m i (t) = v j is defined as M(D(t))[i, j]:</p><p>The probability of a stable model I of &#928; is defined as the product of the probability of each atom c = v in I| &#963; nn , divided by the number of stable models of &#928; that agree with I| &#963; nn on &#963; nn . That is, for any interpretation I,</p><p>where I| &#963; nn denotes the projection of I onto &#963; nn and Num(I| &#963; nn , &#928;) denotes the number of stable models of &#928; that agree with I| &#963; nn on &#963; nn .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Examples of NeurASP</head><p>We show how NeurASP can be applied to the following 4 examples where both perception and high-level reasoning provided by answer set programs are required.</p><p>&#8226; MNIST Digit Addition This is a simple example used in <ref type="bibr">[17]</ref> to illustrate DeepProbLog's ability for both logical reasoning and deep learning. The task is, given a pair of digit images (MNIST) and their sum as the label, to let a neural network learn the digit classification of the input images. The NeurASP program is as follows.</p><p>img(d 1 ). img(d 2 ).</p><p>Figure <ref type="figure">2</ref> shows the accuracy on the test data after each training iteration. The method CNN denotes the baseline used in <ref type="bibr">[17]</ref> where a convolutional neural network (with more parameters) is trained to classify the concatenation of the two images into the 19 possible sums. As we can see, the neural networks trained by NeurASP and DeepProbLog converge much faster than CNN and have almost the same accuracy at each iteration. However, NeurASP spends much less time on training compared to DeepProbLog. The time reported is for one epoch (30,000 iterations in gradient descent). This is because DeepProbLog constructs an SDD at each iteration for each training instance (i.e., each pair of images). This example illustrates that generating many SDDs could be more time-consuming than enumerating stable models in NeurASP computation. In general, there is a trade-off between the two methods and other examples may show the opposite behavior. &#8226; Commonsense Reasoning about Image We show how expressive reasoning originating from answer set programming, such as recursive definition and defaults can be used in NeurASP inference. We also show that reasoning in NeurASP can help identify perception mistakes that violate semantic constraints, which in turn can make perception more robust. Take the problem in Figure <ref type="figure">1</ref> as an example. A neural network for object detection may return a bounding box and its classification "car," but it may not be clear whether it is a real car or a toy car. The distinction can be made by applying reasoning about the relations with the surrounding objects and using commonsense knowledge. To reason about the size relationship among objects, we need to define a recursive definition of "smaller than" relationship.</p><p>smaller(cat, person). smaller(person, car). smaller(person,truck). smaller(X,Y ) &#8592; smaller(X, Z), smaller(Z,Y ).</p><p>We also need to use default reasoning to assert that, by default, we conclude the same size relationship as above between the objects in bounding boxes B 1 and B 2 .</p><p>NeurASP allows the use of such expressive reasoning originating from answer set programming.</p><p>&#8226; Solving Sudoku in Image We show that NeurASP alleviates the burden of neural networks when the constraints/knowledge are already given. Instead of building a large end-to-end neural network that learns to solve a Sudoku puzzle given as an image, we can let a neural network only do digit recognition and use ASP to find the solution of the recognized board. This makes the design of the neural network simpler and the required training dataset much smaller. Also, the neural network may get confused if a digit next to 1 in the same row is 1 or 2, but the reasoner can conclude that it cannot be 1 by applying the constraints for Sudoku. What's more, when we need to solve a variant of Sudoku, such as Anti-knight Sudoku, the modification is much simpler than training another large neural network from scratch to solve the new puzzle. Indeed, one can use the same pre-trained neural network and only need to add a rule in the NeurASP program saying that "no number repeats at a knight move". nn(sp(24, g), [true, false]). % if edge 1 in graph g is selected in the shortest path, % then there is an edge between node 0 and node 1 sp(0,1) :-sp(1,g,true). ... sp(X,Y) :-sp(Y,X).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>% [nr]</head><p>No removed edges should be predicted :-sp(X,g,true), removed(X).</p><p>% [p] (aggregates) Prediction must form a simple path, i.e., % the degree of each node must be either 0 or 2 :-X=0..15, #count{Y: sp(X,Y)} = 1. :-X=0..15, #count{Y: sp(X,Y)} &gt;= 3. % [r] (recursive definition) Every 2 nodes in the prediction must be reachable reachable(X,Y) :-sp(X,Y). reachable(X,Y) :-reachable(X,Z), sp(Z,Y). :-sp(X,A), sp(Y,B), not reachable(X,Y). % [o] (weak constraint) Predicted path should contain least edges :&#8764; sp(X,g,true). <ref type="bibr">[1, X]</ref> In this experiment, we trained the same neural network model M sp as in <ref type="bibr">[26]</ref>, a 5-layer Multi-Layer Perceptron (MLP), but with 4 different settings: (i) MLP only; (ii) together with NeurASP with the simple-path constraint (p) (which is the only constraint used in <ref type="bibr">[26]</ref>); 1 (iii) together with NeurASP with simple-path, reachability, and optimization constraints (p-r-o); and (iv) together with NeurASP with all 4 constraints (p-r-o-nr). 2  Table <ref type="table">1</ref> shows, after 500 epochs of training, the percentage of the predictions on the test data that satisfy each of the constraints p, r, and nr, the path constraint (i.e., p-r), the shortest path constraint (i.e., p-r-o-nr), and the accuracy w.r.t. the ground truth. As we can see, NeurASP helps to train the same neural network such that it's more likely to satisfy the constraints. Besides, the last column shows that a neural network is not always trained better with more constraints </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Open Issues and Expected Achievements</head><p>The biggest issue we are encountering now is that training with NeurASP implementation still takes much more time than pure neural network training due to the internal use of a symbolic reasoning engine (i.e., CLINGO in our case). With our recent experience on encoding logic directly in neural networks, we expect that the scalability issue will be resolved by embedding ASP (or possibly a fragment of ASP) directly in neural networks. We expect to design a new formalism whose rules can be turned into neural network regularizers. We also expect to implement a prototype for the new formalism and apply it to the domains that NeurASP was applied to so that we can compare and analyze the pros and cons of both approach.</p></div></body>
		</text>
</TEI>
