The sum-product network (SPN) has been extended to model sequence data with the recurrent SPN (RSPN), and to decision-making problems with sum-product-max networks (SPMN). In this paper, we build on the concepts introduced by these extensions and present state-based recurrent SPMNs (S-RSPMNs) as a generalization of SPMNs to sequential decision-making problems where the state may not be perfectly observed. As with recurrent SPNs, S-RSPMNs utilize a repeatable template network to model sequences of arbitrary lengths. We present an algorithm for learning compact template structures by identifying unique belief states and the transitions between them through a state matching process that utilizes augmented data. In our knowledge, this is the first data-driven approach that learns graphical models for planning under partial observability, which can be solved efficiently. S-RSPMNs retain the linear solution complexity of SPMNs, and we demonstrate significant improvements in compactness of representation and the run time of structure learning and inference in sequential domains.
more »
« less
Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks
Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes a new online structure learning technique for feed-forward and recurrent SPNs. The algorithm is demonstrated on real-world datasets with continuous features for which it is not clear what network architecture might be best, including sequence datasets of varying length.
more »
« less
- Award ID(s):
- 1815598
- PAR ID:
- 10098333
- Date Published:
- Journal Name:
- Advances in Neural Information Processing Systems (NeurIPS)
- Volume:
- 31
- Page Range / eLocation ID:
- 6944-6954
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper we propose a special type of a tree tensor network that has the geometry of a comb—a one-dimensional (1D) backbone with finite 1D teeth projecting out from it. This tensor network is designed to provide an effective description of higher-dimensional objects with special limited interactions or, alternatively, one-dimensional systems composed of complicated zero-dimensional objects. We provide details on the best numerical procedures for the proposed network, including an algorithm for variational optimization of the wave function as a comb tensor network and the transformation of the comb into a matrix product state. We compare the complexity of using a comb versus alternative matrix product state representations using density matrix renormalization group algorithms. As an application, we study a spin-1 Heisenberg model system which has a comb geometry. In the case where the ends of the teeth are terminated by spin-1/2 spins, we find that Haldane edge states of the teeth along the backbone form a critical spin-1/2 chain, whose properties can be tuned by the coupling constant along the backbone. By adding next-nearest-neighbor interactions along the backbone, the comb can be brought into a gapped phase with a long-range dimerization along the backbone. The critical and dimerized phases are separated by a Kosterlitz-Thouless phase transition, the presence of which we confirm numerically. Finally, we show that when the teeth contain an odd number of spins and are not terminated by spin-1/2's, a special type of comb edge states emerge.more » « less
-
We present a non-Euclidean vector product for artificial neural networks. The vector product operator does not require any multiplications while providing correlation information between two vectors. Ordinary neurons require inner product of two vectors. We propose a class of neural networks with the universal approximation property over the space of Lebesgue integrable functions based on the proposed non-Euclidean vector product. In this new network, the "product" of two real numbers is defined as the sum of their absolute values, with the sign determined by the sign of the product of the numbers. This "product" is used to construct a vector product in RN . The vector product induces the l1 norm. The additive neural network successfully solves the XOR problem. Experiments on MNIST and CIFAR datasets show that the classification performance of the proposed additive neural network is comparable to the corresponding multi-layer perceptron and convolutional neural networks.more » « less
-
null (Ed.)Sum-product networks (SPN) are knowledge compilation models and are related to other graphical models for efficient probabilistic inference such as arithmetic circuits and AND/OR graphs. Recent investigations into generalizing SPNs have yielded sum-product-max networks (SPMN) which offer a data-driven alternative for decision making that has predominantly relied on handcrafted models. However, SPMNs are not suited for decision-theoretic planning which involves sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN) that learn from and model decision-making data over time. RSPMNs utilize a template network that is unfolded as needed depending on the length of the data sequence. This is significant as RSPMNs not only inherit the benefits of SPNs in being data driven and mostly tractable, they are also well suited for planning problems. We establish soundness conditions on the template network, which guarantee that the resulting SPMN is valid, and present a structure learning algorithm to learn a sound template. RSPMNs learned on a testbed of data sets, some generated using RDDLSim, yield MEUs and policies that are close to the optimal on perfectly-observed domains and easily improve on a recent batch-constrained RL method, which is important because RSPMNs offer a new model-based approach to offline RL.more » « less
-
We consider the problem of matrix approximation and denoising induced by the Kronecker product decomposition. Specifically, we propose to approximate a given matrix by the sum of a few Kronecker products of matrices, which we refer to as the Kronecker product approximation (KoPA). Because the Kronecker product is an extensions of the outer product from vectors to matrices, KoPA extends the low rank matrix approximation, and includes it as a special case. Comparing with the latter, KoPA also offers a greater flexibility, since it allows the user to choose the configuration, which are the dimensions of the two smaller matrices forming the Kronecker product. On the other hand, the configuration to be used is usually unknown, and needs to be determined from the data in order to achieve the optimal balance between accuracy and parsimony. We propose to use extended information criteria to select the configuration. Under the paradigm of high dimensional analysis, we show that the proposed procedure is able to select the true configuration with probability tending to one, under suitable conditions on the signal-to-noise ratio. We demonstrate the superiority of KoPA over the low rank approximations through numerical studies, and several benchmark image examples.more » « less
An official website of the United States government

