skip to main content


Title: Augmenting Neural Networks with First-order Logic
Today, the dominant paradigm for training neural networks involves minimizing task loss on a large dataset. Using world knowledge to inform a model, and yet retain the ability to perform end-to-end training remains an open question. In this paper, we present a novel framework for introducing declarative knowledge to neural network architectures in order to guide training and prediction. Our frame-work systematically compiles logical statements into computation graphs that augment a neural network without extra learnable parameters or manual redesign.We evaluate our modeling strategy on three tasks: machine comprehension, natural language inference, and text chunking.Our experiments show that knowledge-augmented networks can strongly improve over baselines, especially in low-data regimes.  more » « less
Award ID(s):
1801446
PAR ID:
10175282
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tensor decomposition is an effective approach to compress over-parameterized neural networks and to enable their deployment on resource-constrained hardware platforms. However, directly applying tensor compression in the training process is a challenging task due to the difficulty of choosing a proper tensor rank. In order to address this challenge, this paper proposes a low-rank Bayesian tensorized neural network. Our Bayesian method performs automatic model compression via an adaptive tensor rank determination. We also present approaches for posterior density calculation and maximum a posteriori (MAP) estimation for the end-to-end training of our tensorized neural network. We provide experimental validation on a two-layer fully connected neural network, a 6-layer CNN and a 110-layer residual neural network where our work produces 7.4X to 137X more compact neural networks directly from the training while achieving high prediction accuracy. 
    more » « less
  2. Bernhardt, Boris C (Ed.)
    We propose a novel neural network architecture, SZTrack, to detect and track the spatio-temporal propagation of seizure activity in multichannel EEG. SZTrack combines a convolutional neural network encoder operating on individual EEG channels with recurrent neural networks to capture the evolution of seizure activity. Our unique training strategy aggregates individual electrode level predictions for patient-level seizure detection and localization. We evaluate SZTrack on a clinical EEG dataset of 201 seizure recordings from 34 epilepsy patients acquired at the Johns Hopkins Hospital. Our network achieves similar seizure detection performance to state-of-the-art methods and provides valuable localization information that has not previously been demonstrated in the literature. We also show the cross-site generalization capabilities of SZTrack on a dataset of 53 seizure recordings from 14 epilepsy patients acquired at the University of Wisconsin Madison. SZTrack is able to determine the lobe and hemisphere of origin in nearly all of these new patients without retraining the network . To our knowledge, SZTrack is the first end-to-end seizure tracking network using scalp EEG. 
    more » « less
  3. In this paper, we present a blockwise optimization method for masking-based networks (BLOOM-Net) for training scalable speech enhancement networks. Here, we design our network with a residual learning scheme and train the internal separator blocks sequentially to obtain a scalable masking-based deep neural network for speech enhancement. Its scalability lets it dynamically adjust the run-time complexity depending on the test time environment. To this end, we modularize our models in that they can flexibly accommodate varying needs for enhancement performance and constraints on the resources, incurring minimal memory or training overhead due to the added scalability. Our experiments on speech enhancement demonstrate that the proposed blockwise optimization method achieves the desired scalability with only a slight performance degradation compared to corresponding models trained end-to-end. 
    more » « less
  4. We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime, where the networks' biases are initialized to some constant rather than zero. We prove that under such initialization, the neural network will have sparse activation throughout the entire training process, which enables fast training procedures via some sophisticated computational methods. With such initialization, we show that the neural networks possess a different limiting kernel which we call bias-generalized NTK, and we study various properties of the neural networks with this new kernel. We first characterize the gradient descent dynamics. In particular, we show that the network in this case can achieve as fast convergence as the dense network, as opposed to the previous work suggesting that the sparse networks converge slower. In addition, our result improves the previous required width to ensure convergence. Secondly, we study the networks' generalization: we show a width-sparsity dependence, which yields a sparsity-dependent Rademacher complexity and generalization bound. To our knowledge, this is the first sparsity-dependent generalization result via Rademacher complexity. Lastly, we study the smallest eigenvalue of this new kernel. We identify a data-dependent region where we can derive a much sharper lower bound on the NTK's smallest eigenvalue than the worst-case bound previously known. This can lead to improvement in the generalization bound. 
    more » « less
  5. Robust Mask R-CNN (Mask Regional Convolutional Neural Network) methods are proposed and tested for automatic detection of cracks on structures or their components that may be damaged during extreme events, such as earthquakes. We curated a new dataset with 2,021 labeled images for training and validation and aimed to find end-to-end deep neural networks for crack detection in the field. With data augmentation and parameters fine-tuning, Path Aggregation Network (PANet) with spatial attention mechanisms and High- resolution Network (HRNet) are introduced into Mask R-CNNs. The tests on three public datasets with low- or high-resolution images demonstrate that the proposed methods can achieve a big improvement over alternative networks, so the proposed method may be sufficient for crack detection for a variety of scales in real applications. 
    more » « less