<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Learning Coalition-Based Interactions in Networked Social Systems</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2020 Spring</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10125916</idno>
					<idno type="doi"></idno>
					<title level='j'>Association for the Advancement of Artificial Intelligence Conference 2020</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Abhijin Adiga</author><author>Chris J. Kuhlman</author><author>Madhav V. Marathe</author><author>S. S. Ravi</author><author>Daniel J. Rosenkrantz</author><author>Richard E. Stearns</author><author>Anil Vullikanti</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Using a discrete dynamical system model for a networkedsocial system, we consider the problem of learning a classof local interaction functions in such networks. Our focus ison learning local functions which are based on pairwise disjointcoalitions formed from the neighborhood of each node.Our work considers both active query and PAC learning models.We establish bounds on the number of queries needed tolearn the local functions under both models.We also establisha complexity result regarding efficient consistent learners forsuch functions. Our experimental results on synthetic and realsocial networks demonstrate how the number of queries dependson the structure of the underlying network and numberof coalitions.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Motivation. Learning the nature of interactions in networked physical and social systems is a challenging problem (see e.g., <ref type="bibr">(Laubenbacher and Stigler 2004;</ref><ref type="bibr">Romero, Meeder, and Kleinberg 2011;</ref><ref type="bibr">Gonz&#225;lez-Bail&#243;n et al. 2011)</ref>). We use a graphical dynamical systems model, called a synchronous dynamical system (SyDS) (see e.g., <ref type="bibr">(Barrett et al. 2006</ref>)) to represent these networked systems. Such a system consists of an undirected graph G(V, E), where the nodes represent entities (agents) and the edges represent pairwise interactions. (Formal definitions are provided in Section 2.) Each node v has a time varying state value (assumed to be Boolean) and a local function f v which determines the next state of the node using the current states of v and its neighbors. The SyDS model assumes that nodes compute and update their state values synchronously. The graph and the local functions determine the dynamics of the system.</p><p>The problem of understanding the nature of interactions in a networked system can be formulated as that of inferring the local functions in a SyDS model of the system. We consider inference through interactions with the system where a user may specify each query in the form of a configuration (i.e., the current state values of nodes) and the system provides the successor configuration, i.e., states of the nodes at the next Copyright c 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. time instant <ref type="bibr">(Adiga et al. 2018;</ref><ref type="bibr">He et al. 2016)</ref>. We also consider inference under the Probably Approximately Correct (PAC) learning framework where configuration-successor pairs are independently drawn from an unknown distribution (see similar work in <ref type="bibr">(Narasimhan, Parkes, and Singer 2015;</ref><ref type="bibr">He et al. 2016)</ref>). Under both models, we assume that the network is known.</p><p>The great majority of prior work focuses on each agent's individual behavior, where an agent treats each neighbor as an autonomous influencer. However, in several situations, an agent is influenced by groups formed by its neighbors. Here, our focus is on learning a form of interaction based on pairwise disjoint coalitions formed by the neighbors of an agent. The motivation for this model comes from the work reported in <ref type="bibr">(Ugander et al. 2012;</ref><ref type="bibr">Laubenbacher and Stigler 2004;</ref><ref type="bibr">Col&#243;n-Reyes et al. 2006</ref>). The model studied in <ref type="bibr">(Ugander et al. 2012</ref>) uses a social network and considers the connected components formed by the one-hop neighbors of a node v. Since the connected components are node disjoint, so are the coalitions. The experimental evidence presented in <ref type="bibr">(Ugander et al. 2012)</ref> shows that coalitions are indeed operative in social networked systems. In particular, the results in this reference point out that people consider coalitions of their neighbors in deciding whether to join Facebook. Thus, our model of non-overlapping coalitions has direct relevance to social systems. The model studied in <ref type="bibr">(Laubenbacher and Stigler 2004;</ref><ref type="bibr">Col&#243;n-Reyes et al. 2006</ref>) considers polynomial interaction functions, where each polynomial is a sum of monomials (products of variables where the degree of each variable is at most 1). The monomials can be thought of as coalitions. For Boolean functions, sums of monomials correspond to monotone functions in disjunctive normal form (DNF); such functions are in the sum of products form where no variable appears negated. Each product term in a DNF represents a coalition. Since coalitions considered in models of social systems generally do not overlap <ref type="bibr">(Branzei, Dimitrov, and Tijs 2005;</ref><ref type="bibr">Ugander et al. 2012)</ref>, we have the additional requirement that the coalitions must partition the set of inputs. We call such functions partitioned monotone DNF (PM-DNF) functions. As an example, the Boolean function of five variables defined by f (x 1 , x 2 , x 3 , x 4 , x 5 ) = x 1 x 5 + x 2 x 3 + x 4 consists of three pairwise disjoint coalitions. The interpretation is that the function takes on the value 1 iff least one of the coalitions is unanimous, i.e., all the variables in that coalition have the value 1. Thus, in a social system with PM-DNF functions, a node changes to 1 at time &#964; + 1 iff there is at least one unanimous coalition among its inputs at time &#964; .</p><p>Our work considers the problem of learning PM-DNF functions under the active query model of <ref type="bibr">(Adiga et al. 2018)</ref> and the PAC learning model <ref type="bibr">(Valiant 1984)</ref>. Two extreme cases of the PM-DNF model are well studied: (i) the Boolean OR function where every coalition has exactly one neighbor (which corresponds to the simple contagion model of <ref type="bibr">(Granovetter 1978)</ref>) and (ii) Boolean AND function where all the neighbors form a single coalition (which corresponds to a particular type of complex contagion model of <ref type="bibr">(Centola and Macy 2007)</ref>). Our model is a generalization of these two extreme cases. Also, to the best of our knowledge, this is the first work that addresses learning functions that depend on groups of neighbors, rather than individual neighbors. Such a dynamical system can also be viewed as a model for diffusion on hypergraphs <ref type="bibr">(Zhu et al. 2018</ref>). Summary of results. 1. Bounds under the active query model. We present an algorithm that can learn any PM-DNF function with q inputs using O(q log q) membership queries<ref type="foot">foot_0</ref> under the adaptive mode (where a query may depend on the answers to the previous queries). We also show that in the worstcase, &#8486;(q log q) queries are required under the adaptive mode to infer such a function. In addition, we show that O(&#967;&#8710; log &#8710;) adaptive queries are sufficient to infer all the local functions of a SyDS where &#967; and &#8710; represent the number of colors needed to color G<ref type="foot">foot_1</ref> (the square graph 2 of G) and the maximum node degree of G respectively. 2. PAC model upper bound. For any fixed values of the parameters and &#948;, we show that for learning the PM-DNF functions at all the nodes of a SyDS, an upper bound on the sample complexity is O (2m + n) log(&#8710; + 1) , where m = |E| and &#8710; is the maximum node degree. 3. Complexity of efficient PAC learning. We show that the class of PM-DNF functions with two or more product terms is not efficiently PAC learnable unless NP = RP. (The corresponding problem for one product term is efficiently solvable since a PM-DNF function with one product term is just the AND function.) 4. An algorithm for learning under the PAC model. To cope with the intractability result mentioned in Item 3 above, we present an integer linear programming (ILP)-based algorithm for determining whether there is a PM-DNF function that is consistent with all the given examples (definition in Section 4). This algorithm can be used to construct a PAC learning algorithm for PM-DNF functions in practice. 5. Experimental results. We present experimental results for generating query sets under the adaptive model for both synthetic and real social networks. The number of queries re-quired depends on the structure of the graph and number of blocks (i.e., coalitions). For example, in the case of scalefree networks, the number of queries required is much less than the theoretical upper bound established in this paper. Under the PAC model, we analyze a single local function with regard to sample distribution, size of the input and number of blocks. Interestingly, the ILP-based algorithm exhibits better performance when the number of blocks is large.</p><p>For space reasons, proofs for many of the results are omitted; they appear in <ref type="bibr">(Adiga et al. 2019a</ref>). Related work. Many researchers have addressed the problem of learning components of physical and social systems (see e.g., <ref type="bibr">(Adiga et al. 2018;</ref><ref type="bibr">He et al. 2016;</ref><ref type="bibr">Laubenbacher and Stigler 2004;</ref><ref type="bibr">Romero, Meeder, and Kleinberg 2011;</ref><ref type="bibr">Gonz&#225;lez-Bail&#243;n et al. 2011)</ref>). As mentioned earlier, the coalition-based interaction model was motivated by the work in <ref type="bibr">(Ugander et al. 2012;</ref><ref type="bibr">Col&#243;n-Reyes et al. 2006)</ref>. The problem of learning Boolean DNF functions has received attention in the learning theory literature under membership query and PAC learning models. For example, bounds on the number of membership queries for learning monotone DNFs are proven in <ref type="bibr">(Abasi, Bshouty, and Mazzawi 2014)</ref>. In their work, the product terms may not partition the set of variables. The problem of learning discrete distributions over {0, 1} n is considered in <ref type="bibr">(Kearns et al. 1994)</ref>; some of their results use circuits that compute monotone DNF (but not PM-DNF) functions. Other learning problems for DNF functions have been considered in several papers (e.g., <ref type="bibr">(Angluin and Slonim 1994;</ref><ref type="bibr">Li&#347;kiewicz, Lutter, and Reischuk 2017;</ref><ref type="bibr">Servedio 2004)</ref>). The topic of active learning has also been explored in the context of sensor networks (e.g., (Castro and Nowak 2007)).</p><p>To our knowledge, the problem of learning PM-DNF functions for networked systems has not been addressed in the literature. In particular, the adaptive query techniques presented in <ref type="bibr">(Adiga et al. 2018)</ref> for learning symmetric functions (and threshold functions which are a subclass of symmetric functions) cannot be applied to PM-DNF functions since the latter is not a subclass of the former. For example, the PM-DNF function f (x 1 , x 2 , x 3 , x 4 ) = x 1 x 2 + x 3 x 4 is not a symmetric function since f (1, 0, 1, 0) = 0 = f (1, 1, 0, 0); hence, it is also not a threshold function. Learning threshold functions under the PAC model is considered in <ref type="bibr">(Adiga et al. 2019b)</ref>; they present a complexity result for efficient consistent learners for threshold functions similar to our result for PM-DNF functions. However, our complexity result is not implied by the one in <ref type="bibr">(Adiga et al. 2019b</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Definitions and Problem Formulations</head><p>Model for networked social systems. Following <ref type="bibr">(Barrett et al. 2006)</ref>, we use a formalism called a synchronous dynamical system (SyDS), to model a networked social system. Let B denote the Boolean domain {0,1}. A SyDS S over B is a pair S = (G, F ), where (i) G(V, E), an undirected graph with n = |V | nodes, represents the underlying graph of the SyDS, and (ii) of the nodes in the closed neighborhood of v i (i.e., node v i and the neighbors of v i in G). For each input, the output of function f i gives the next state of v i . In a SyDS, all nodes compute and update their next state synchronously (i.e., in parallel). At any time &#964; , if</p><p>The system evolves in discrete time steps by repeated application of F . If C and C denote two successive configurations of a SyDS, then C is the successor of C.</p><p>Partitioned monotone DNF functions. In this paper, each local function f i is based on coalitions formed by the closed neighborhood of node v i , 1 &#8804; i &#8804; n. Our focus is on one class of such Boolean functions, called partitioned monotone DNF (PM-DNF) functions.</p><p>Definition 1. A Boolean function f is a PM-DNF iff it has a disjunctive normal form (DNF) (i.e., sum of products) representation satisfying the following two properties: (i) all the variables appear unnegated in f (i.e., f is monotone) and (ii) the collection of product terms (also referred to as blocks or coalitions) partitions the set of inputs to f ; i.e., each input appears in exactly one block.</p><p>Example 1. Suppose we have five Boolean variables, denoted by x 1 , x 2 , x 3 , x 4 and x 5 . One example of a PM-DNF function is f 1 (x 1 , x 2 , x 3 , x 4 , x 5 ) = x 1 x 3 x 5 + x 2 x 4 , which has two product terms (coalitions). Note that the OR function f 3 (x 1 , x 2 , x 3 , x 4 , x 5 ) = x 1 +x 2 +x 3 +x 4 +x 5 (five coalitions) and the AND function</p><p>For simplicity, we use the abbreviation PM-DNF-SyDS to denote a SyDS in which every local function is a PM-DNF function. We now present an example of such a SyDS. Example 2. The graph of a PM-DNF-SyDS is shown in Figure 1. Suppose the initial configuration is (1, 0, 0, 0); that is, node v 1 is in state 1 and nodes v 2 , v 3 and v 4 are in state 0. It can be seen that the system goes through the following sequence of configurations during the next two time steps:</p><p>(1, 0, 0, 0) -&#8594; (1, 1, 0, 0) -&#8594; (1, 1, 1, 1). From the configuration (1, 1, 1, 1), no further state changes occur. Such a configuration is a fixed point for this system.</p><p>Active query model. This query model for SyDSs was proposed in <ref type="bibr">(Adiga et al. 2018)</ref>. Under this model, each query, which we call a successor query, specifies a configuration C; the response to the query is the configuration C , the successor of C. One can think of C as specifying an input to each local function and the response C as specifying the value of each local function for the input specified by C.</p><p>For expository purposes, we consider learning each local function separately. Thus, to learn an unknown PM-DNF function f , a query specifies an assignment &#945; of values to the inputs of f ; the response to the query is the Boolean value f (&#945;). In the learning theory literature, such queries are called membership queries (see e.g., <ref type="bibr">(Angluin and Slonim 1994)</ref>). Since our goal is to use as few membership queries as possible, we will use the adaptive query mode considered in <ref type="bibr">(Adiga et al. 2018)</ref>. In this mode, membership queries are generated one at a time; a query may depend on the responses for previous queries. We also consider learning PM-DNF functions under the PAC model; we refer the reader to <ref type="bibr">(Antony and Biggs 1992;</ref><ref type="bibr">Kearns and Vazirani 1994)</ref> for the relevant definitions. Positive and negative examples. For an unknown PM-DNF f , each example &#951; given to a PAC learner is a pair (&#945;, &#946;), where &#945; is an assignment of {0,1} values to the inputs of f and &#946; &#8712; {0, 1} is the value f (&#945;) of the function. These are positive examples. We need not consider negative examples here since a negative example of the form (&#945;, &#946;), that is, "&#946; is not the output of f for input &#945;", is equivalent to the positive example (&#945;, &#946;).</p><p>The concept class of PM-DNF functions is PAC learnable by a learner L using the hypothesis space H if for any target concept c, values and &#948; such that 0 &lt; , &#948; &lt; 1/2, and distribution D over the instance space, L outputs with a probability of at least 1 -&#948;, a hypothesis h &#8712; H such that error D (h) &#8804; . The sample complexity of a learner, denoted by M( , &#948;), is the number of examples needed by the learner to output an appropriate hypothesis h. We will use the following well-known upper bound <ref type="bibr">(Haussler 1988</ref>) on M( , &#948;) based on the size of the hypothesis space H:</p><p>3 Bounds Under the Active Query Model Lower bound. We establish the lower bound by pointing out that any algorithm that uses membership queries under the adaptive mode can be viewed as a decision tree, like the one used to establish a lower bound on comparison-based sorting algorithms <ref type="bibr">(Cormen et al. 2009)</ref>. A proof of the following theorem appears in <ref type="bibr">(Adiga et al. 2019a)</ref>.</p><p>Theorem 1. Every algorithm that uses membership queries under the adaptive mode to learn a PM-DNF function with q inputs must use &#8486;(q log (q)) queries in the worst-case.</p><p>Upper bound: A query generation algorithm to learn a PM-DNF function. We now discuss our algorithm for generating membership queries under the adaptive mode to learn an unknown PM-DNF function f with q inputs, denoted by x 1 , x 2 , . . ., x q . For each block of f , the variable with the smallest index will be referred to as the key variable for that block. For example, if one of the blocks is x 2 x 7 x 9 , then the key variable for that block is x 2 . For a set of blocks, the key block for that set is the block with the largest key variable. For a set of blocks, we define the superkey variable for that set of blocks to be the key variable of the key block in the set of blocks.</p><p>The algorithm consists of a loop that identifies the blocks of f , one block at a time. For each iteration of this loop, we refer to the already discovered blocks as the known blocks, and the remaining blocks as the unknown blocks. We refer to the variables in the known blocks as allocated variables, and the variables in the unknown blocks as unallocated variables. The unallocated variables are sorted by their index, lowest index first. The list of unallocated variables can be considered to be divided into two parts; a left part consisting of primary unallocated variables, and a right part (possibly empty) consisting of secondary unallocated variables. As the algorithm proceeds, the primary unallocated variables are unallocated variables that are potentially the key variable of some unknown block, whereas secondary unallocated variables are unallocated variables that the responses to previously issued queries have shown are not the key variable of any block.</p><p>Initially all the blocks are unknown, and all the variables are primary unallocated variables. At each iteration of the loop, the algorithm finds the key block among the currently unknown blocks. The algorithm does this by first finding the superkey variable for the set of unknown blocks, thereby identifying the key variable of the key unknown block. The current iteration of the loop then proceeds by finding all the remaining variables in the key unknown block, one variable at a time. Once all the variables of the key unknown block have been found, that block is now known, and the status of its variables changes to allocated. After changing the status of this block, if all the variables are allocated (i.e., belong to known blocks), then all the blocks of f have been identified, so the algorithm is finished. Otherwise, the algorithm reiterates the loop, to discover the new key unknown block.</p><p>Each iteration of the loop consists of two major substeps; the details of these substeps are provided below. Recall that each membership query specifies an assignment &#945; of {0,1} values to the variables, and the response to the query is the value f (&#945;). The algorithm uses two types of membership queries: superkey queries and block queries. Substep 1 of each loop iteration uses superkey queries to find the superkey variable of the unknown blocks. Once this superkey variable is identified, Substep 2 uses block queries to find the other members of the key block. Substep 1: Finding the superkey variable of the unknown blocks. The algorithm uses a binary search over the primary unallocated variables, using superkey queries to guide the search. The binary search maintains a list L of candidate variables, each of which is a primary unallocated variable, and might potentially be the superkey. List L initially consists of all the primary unallocated variables, since the superkey is one of these variables.</p><p>If list L of candidate variables contains only one variable, say variable x k , then the binary search is over, and variable</p><p>x k is the superkey. Otherwise, the binary search to find the superkey proceeds as follows. Let x j be the |L|/2 th variable on list L. Let &#945; j be the assignment to the q variables where a given variable x i is 1 iff x i is unallocated and i &gt; j.</p><p>The algorithm issues &#945; j as a query, which we refer to as a superkey query.</p><p>Suppose f (&#945; j ) = 0. Then the unallocated variables to the right of x j do not contain a complete block, so none of these variables can be the key of any unknown block. So, the status of each primary unallocated variable x i such that i &gt; j is changed to secondary. Also, each candidate variable x i on list L such that i &gt; j is deleted from L.</p><p>Suppose f (&#945; j ) = 1. Then the unallocated variables to the right of x j contain a complete block, so the superkey is a candidate variable x i such that i &gt; j. So, each candidate variable x i on list L such that i &#8804; j is deleted from L.</p><p>In this manner, the search for the superkey variable is recursively continued on the left or right half of list L, depending on the value of f (&#945; j ), until L contains just one variable. Note that each query reduces the size of L by a factor of 2. (More precisely, if L and L denote respectively the list before and after the list shortening, then |L | = |L|/2 ). Thus, the number of queries used to find the superkey variable is at most log (q) . Substep 2: Finding the other variables in the key block. Substep 2 uses a loop that searches for the other variables in the key block, one variable at a time. At the beginning of each iteration of this loop, a nonempty set of key block members (including the superkey) have already been found. We refer to this set of variables as identified key block members. The iteration begins by issuing a block query &#945; wherein a given variable is 1 iff it is an identified key block member. If f (&#945;) = 1, then the identified key block members form the complete key block. The key block is now known, so the status of its members is changed to allocated, and Substep 2 is complete.</p><p>If f (&#945;) = 0, then the key block contains at least one additional member, and a binary search is used to find the additional member with the lowest index. The binary search maintains a list L of candidate variables, each of which is a secondary unallocated variable, and which might potentially be the next member of the key block. List L initially consists of all the secondary unallocated variables to the right of the last member added to the key block, since the next member of the key block is one of these variables. If list L of candidate variables contains only one variable, say variable x j , then the binary search is over, and variable x j is the next member of the key block. Variable x j is now the newest identified key block member, and another iteration of the main loop for Substep 2 begins.</p><p>Otherwise, if list L of candidate variables contains more than one variable, the binary search to find the next member of the key block proceeds as follows. Let x j be the |L|/2 th variable on list L. Let &#945; j be the assignment to the q variables where a given variable x i is 1 iff either x i is an identified key block member or x i is unallocated and i &gt; j. The algorithm issues block query &#945; j .</p><p>Suppose f (&#945; j ) = 0. Then the key block has a variable x i such that i &#8804; j and x i is on list L. Thus, each candidate variable x i on list L such that i &gt; j is deleted from L.</p><p>Suppose f (&#945; j ) = 1. Then the next member of the key block is a variable x i such that i &gt; j and x i is on list L. Thus, each candidate variable x i on list L such that i &#8804; j is deleted from L.</p><p>In this manner, the search for the next member of the key block is recursively continued on the left or right half of list L, depending on the value of f (&#945; j ), until L contains just one variable. Since each query reduces the size of L by a factor of 2, the number of queries used to find the next member of the key block is at most log (q) . Overall, the algorithm uses at most 1 + log (q) queries per variable. Thus, an upper bound on the number of queries is q(1 + log (q) ) = O(q log (q)). Thus, we have: Theorem 2. A PM-DNF function f with q inputs can be learned using at most q(1 + log (q) ) = O(q log (q)) adaptive membership queries. Inferring all local functions. The following theorem provides an upper bound on the number of queries needed to learn all local functions of a PM-DNF-SyDS. A proof of the theorem appears in <ref type="bibr">(Adiga et al. 2019a</ref>). Theorem 3. For a PM-DNF-SyDS with underlying graph G, O(&#967;(G 2 ) &#8710; log (&#8710;)) successor queries are sufficient to infer all the local functions. Here, &#8710; is the maximum node degree in G and &#967;(G 2 ) is the minimum number of colors needed for a valid node coloring of G 2 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results Under the PAC Learning Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Upper bound on the number of queries</head><p>We begin with an upper bound on the sample complexity to learn a PM-DNF function under the PAC model. A proof of the following result appears in <ref type="bibr">(Adiga et al. 2019a)</ref>. Proposition 1. Let , &#948; &gt; 0 be fixed. The asymptotic sample complexity M( , &#948;) for PAC learning all the PM-DNF local functions for a given graph G(V, E) is M( , &#948;) = O (2m+ n) log(&#8710; + 1) , where m = |E| and &#8710; is the maximum degree of G.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">A complexity result for efficient PAC learning</head><p>We will show that a class of PM-DNF functions is not efficiently PAC learnable unless NP = RP. To do this, we need to introduce the notion of consistency of a PM-DNF function with respect to a set of examples. Consistent hypothesis. Given a set E of examples, we say that a hypothesis (i.e., a PM-DNF function)</p><p>As is well known in the learning theory literature (see e.g., <ref type="bibr">(Kearns and Vazirani 1994)</ref>), algorithms for obtaining consistent hypotheses are useful in constructing PAC learning algorithms.</p><p>We now present our complexity result for the class of PM-DNF functions with two or more product terms. (As stated in Section 1, the case of a PM-DNF function with one product term is trivial.) To prove the result, we use the following problem which is known to be NP-complete <ref type="bibr">(Garey and Johnson 1979)</ref>.</p><p>Hypergraph 2-Colorability (H2C): Given a set U = {u 1 , u 2 , . . . , u q } and a collection Y = {Y 1 , Y 2 , . . . , Y k } of subsets of U (i.e., the hyperedges) with |Y j | &#8805; 2, 1 &#8804; j &#8804; k, can the elements in U be colored with two colors so that no hyperedge in Y is monochromatic (i.e., each subset in Y contains at least one element of each color)? Theorem 4. If NP = RP, the class of PM-DNF functions with two or more product terms is not efficiently PAC learnable. Proof (idea). We use a reduction from H2C to show that if there is an efficient PAC learning algorithm for PM-DNF functions with two or more blocks, then there is an RP-time algorithm for H2C, contradicting the assumption that NP = RP. Details appear in <ref type="bibr">(Adiga et al. 2019a</ref>).</p><p>We note that Theorem 4 holds for the case of proper learning where the hypothesis class and the concept class are the same, namely the class of PM-DNF functions. Whether the result can be extended to the representation-independent setting (see, e.g., <ref type="bibr">(Warmuth 1989</ref>)) is left for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">An ILP-based PAC learning algorithm</head><p>As is well known, if a hypothesis h (which in this case is a PM-DNF function) that is consistent with all the given examples can be constructed, then the number of examples used to learn h is within a constant factor of the minimum sample complexity needed to learn the hypothesis class <ref type="bibr">(Blumer et al. 1989)</ref>. Therefore, we focus on developing an algorithm for a consistent learner. We consider the following problem which we call Consistent Learning of Partitioned Monotone DNF functions (CL-PMDNF). Given: A set E of examples for an unknown PM-DNF function f with q inputs given by X = {x 1 , x 2 , . . . , x q }; E is partitioned into E 0 and E 1 , where E 0 (E 1 ) is the set of examples in which the function value is 0 (1); integer k &#8804; q. Requirement: Determine whether the variable set X can be partitioned into exactly k blocks, with each block forming a product term of the function, so that the resulting function is consistent with E. If so, find one such partition.</p><p>The above formulation assumes that we know the number of blocks. This can be done without loss of generality since we can try the values 1, 2, . . ., q for the number of blocks.</p><p>Let B 1 , B 2 , . . ., B k denote the blocks (product terms) of the unknown PM-DNF function. To develop our ILP formulation for CL-PMDNF, let z ij be an indicator variable which has the value 1 if variable x i is in Block B j and 0 otherwise, 1 &#8804; i &#8804; q and 1 &#8804; j &#8804; k. We now explain the constraints in our ILP.</p><p>The following two sets of constraints enforce the following requirements: (i) each variable appears in exactly one block and (ii) each block is nonempty (since we must have exactly k blocks).</p><p>Consider any example &#951; p = (&#945; p , 0) in E 0 . Let S p &#8838; X be the set of variables that have the value 0 in the input assignment &#945; p . Since the value of the function is 0, each block must have at least one of the variables from S p . This gives rise to the following set of constraints:</p><p>Consider any example &#951; r = (&#945; r , 1) of E 1 . Let S r &#8838; X be the set of variables that have the value 0 in the input assignment &#945; r . Since the value of the function is 1, there is at least one block which does not have any of the variables from S r . To capture this constraint, we introduce k auxiliary {0,1} variables, denoted by b r,1 , b r,2 , . . ., b r,k , and the following constraints. (Note that each example in E 1 gives rise to a distinct set of auxiliary variables.)</p><p>It can be verified that the last two sets of constraints together imply that there is a block B j that does not contain any of the variables in S r .</p><p>Thus, the CL-PMDNF problem is represented by the set of constraints given above along with the following constraints on the variables: (i) z ij &#8712; {0, 1}, for 1 &#8804; i &#8804; q, 1 &#8804; j &#8804; k and (ii) b r,j &#8712; {0, 1} for each example &#951; r in E 1 and 1 &#8804; j &#8804; k. There is a PM-DNF function with k blocks that is consistent with E iff there is a feasible solution to the above set of constraints. A PAC learning algorithm from the ILP formulation. Our PAC learning algorithm for PM-DNF functions constructs the ILP from the given set E of examples and each possible value of k (the number of blocks) and outputs one such function when there is a feasible solution to the ILP.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experimental Results</head><p>In this section, we evaluate the algorithms developed in the previous sections and compare the results to the derived bounds. In the case of active query, key aspects that we address are how network and local function structures affect the number of queries. We consider both the size (number of nodes as well as edge density) and structure (regular, scalefree, etc.) of the network. For local functions, we consider different numbers of blocks. For PAC learning, our focus is on the difference between the structures of the inferred partition and the true partition with respect to the number of examples sampled, example distribution, and the number of blocks. We used both synthetic and mined networks from the web for the experiments. As shown in Table <ref type="table">1</ref>, five mined networks and three classes of synthetic networks were used. There are five graph instances for each of the random regular (RR) and Barab&#225;si-Albert (BA) synthetic (scale-free) networks for a specified edge density.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Active query</head><p>We applied the greedy coloring algorithm in <ref type="bibr">(Kosowski and Manuszewski 2004)</ref>   number of blocks possible. The local function was generated using the following iterative process. Blocks were indexed {1, 2, . . . , b}. In each iteration, the block index was cyclically incremented. We chose a node uniformly at random without replacement and assigned the block index corresponding to that iteration. For example, suppose q = 5 and b = 3. Then, there are 5 iterations and the block ID assignment happens in the following order: [1, 2, 3, 1, 2]. In the 4th iteration, for example, there are 2 inputs without block ID (since we are sampling without replacement). One of them is chosen randomly and assigned block ID 1. Given this, the algorithm was evaluated using five different values of b (namely 2, 5, 10, 20, and 50) for each network. Each experiment was repeated 100 times.</p><p>Effect of network structure. In Figure <ref type="figure">2</ref>, we note that for a majority of the networks, the number of queries (#queries) required is &lt; 30% of the upper bound. This is mainly due to the skewed degree distribution of most networks except for random regular networks. Consider the maximum degree of a node in each color class. For scale-free networks, Each curve in the first and third figures corresponds to (q; k): the number of inputs and true size of the block. In both cases, the results are shown for uniform distribution with p = 0.5. The Rand index is averaged over 100 repetitions. In the second figure, each curve corresponds to a distinct (q, k, p). In all cases, the block size parameter for the ILP was set to k.</p><p>most color classes have a small maximum degree; therefore, few queries are needed to determine the local functions of vertices in these classes. Interestingly, there is no clear correlation between edge density and #queries. For example, the NRV (New River Valley Friendship) network is much smaller in size, average degree and maximum degree when compared to CoAstro (Coauthorship Astrophysics) network. Yet, when compared with the upper bound, the #queries required is much higher for the NRV network.</p><p>Effect of the number of blocks b. As the number of blocks increases, the #queries required also increases. This is because most queries are required to discover the beginning of a block. Also, when b &gt; &#8710; (the maximum degree), we observe a plateau due to saturation, as all the partitions contain only blocks of size one. For RR10, RR50 and NRV, there is a sharp increase in #queries between b = 2 and 10 as in every color class, b log 2 &#8710; additional queries are required. However, for other networks, because of the skewed degree distribution which leads to many nodes with small degrees, saturation occurs at much lower values of b.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">PAC Model</head><p>In the PAC model we restricted our attention to a single local function. The objective is to evaluate the ILP-based algorithm with regard to sample distribution, number of inputs and true block size. The true PM-DNF and the inferred PM-DNF were compared using Rand index <ref type="bibr">(Rand 1971)</ref>. Rand index for two partitions X and Y of a set is a+b a+b+c+d where a, b, c, and d are respectively the number of pairs of elements (x, y) from the set such that x and y are in (i) the same subset in X and in Y , (ii) different subsets in X and in Y , (iii) same subset in X but different subsets in Y , and (iv) different subsets in X but same subset in Y . The examples were sampled from a uniform distribution over configurations. Each element is set to state 1 independently with the same probability p. The values of p considered were 0.25, 0.5 and 0.75. Also, we considered input sizes q = 10 and 20 and block sizes k = 2, 5 and 10. Each experiment was repeated 10 times. We assumed that the inference algorithm has knowledge of the number of blocks in the true partition. Given the number of inputs and number of blocks, we used a similar method as in the active query case to con-struct the PM-DNF function.</p><p>The results in Figure <ref type="figure">3</ref> show a rapid increase in the quality of inference with relatively small increments in the number of queries for the case of uniform distribution (Figures <ref type="figure">3(a</ref>) and (b)). As expected, the greater the number of inputs, the greater is the number of queries required. Interestingly, unlike the active query case, fewer samples are required to infer the local function as the number of blocks increases. This can be explained as follows. Under the uniform distribution, the probability that all elements of a block of size are 1 is p ; this gives a greater chance of a block being discovered. This is also the reason why as p increases, the chance of discovering a block is higher. However, when p is too high (as in p = 0.75), there is a higher chance that a block is hidden in a bigger set of nodes in every example, thus leading to a lower Rand index. Also, we note that as the number of examples is increased, there is an increase in the variance of the Rand index before it is 1 for all repetitions. This variance is higher when the number of blocks is much less than the number of inputs. The running time (Figure <ref type="figure">3</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Future Work</head><p>It is of interest to extend our results to other types of coalition-based functions; for example, we may require that for the function to have the value 1, at least k &#8805; 2 coalitions must be unanimous. Our complexity result for learning PM-DNF functions is for the case of proper learning where the hypothesis class and the concept class are the same. It is of interest to consider the learning problem under the representation-independent setting. Developing other learning algorithms that can scale to large networks is another direction for future work.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>A membership query specifies an input to a Boolean function and the response is the value of the function.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>The square G 2 (V, E 2 ) of a graph G(V, E) has the edge {u, v} whenever there is a path of length &#8804; 2 between u and v in G.</p></note>
		</body>
		</text>
</TEI>
