To verify safety and robustness of neural networks, researchers have successfully appliedabstract interpretation, primarily using the interval abstract domain. In this paper, we study the theoretical power and limits of the interval domain for neural-network verification. First, we introduce theinterval universal approximation(IUA) theorem. IUA shows that neural networks not only can approximate any continuous functionf(universal approximation) as we have known for decades, but we can find a neural network, using any well-behaved activation function, whose interval bounds are an arbitrarily close approximation of the set semantics off(the result of applyingfto a set of inputs). We call this notion of approximationinterval approximation. Our theorem generalizes the recent result of Baader et al. from ReLUs to a rich class of activation functions that we callsquashable functions. Additionally, the IUA theorem implies that we can always construct provably robust neural networks under ℓ∞-norm using almost any practical activation function. Second, we study the computational complexity of constructing neural networks that are amenable to precise interval analysis. This is a crucial question, as our constructive proof of IUA is exponential in the size of the approximation domain. We boil this question down to the problem of approximating the range of a neural network with squashable activation functions. We show that the range approximation problem (RA) is a Δ2-intermediate problem, which is strictly harder thanNP-complete problems, assumingcoNP⊄NP. As a result,IUA is an inherently hard problem: No matter what abstract domain or computational tools we consider to achieve interval approximation, there is no efficient construction of such a universal approximator. This implies that it is hard to construct a provably robust network, even if we have a robust network to start with.
more »
« less
A unified and constructive framework for the universality of neural networks
Abstract One of the reasons why many neural networks are capable of replicating complicated tasks or functions is their universal approximation property. Though the past few decades have seen tremendous advances in theories of neural networks, a single constructive and elementary framework for neural network universality remains unavailable. This paper is an effort to provide a unified and constructive framework for the universality of a large class of activation functions including most of the existing ones. At the heart of the framework is the concept of neural network approximate identity (nAI). The main result is as follows: any nAI activation function is universal in the space of continuous functions on compacta. It turns out that most of the existing activation functions are nAI, and thus universal. The framework induces several advantages over the contemporary counterparts. First, it is constructive with elementary means from functional analysis, probability theory, and numerical analysis. Second, it is one of the first unified and constructive attempts that is valid for most of the existing activation functions. Third, it provides new proofs for most activation functions. Fourth, for a given activation and error tolerance, the framework provides precisely the architecture of the corresponding one-hidden neural network with a predetermined number of neurons and the values of weights/biases. Fifth, the framework allows us to abstractly present the first universal approximation with a favorable non-asymptotic rate. Sixth, our framework also provides insights into the developments, and hence providing constructive derivations, of some of the existing approaches.
more »
« less
- PAR ID:
- 10517351
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- IMA Journal of Applied Mathematics
- Volume:
- 89
- Issue:
- 1
- ISSN:
- 0272-4960
- Format(s):
- Medium: X Size: p. 197-230
- Size(s):
- p. 197-230
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)A new network with super-approximation power is introduced. This network is built with Floor (⌊x⌋) or ReLU (max{0,x}) activation function in each neuron; hence, we call such networks Floor-ReLU networks. For any hyperparameters N∈N+ and L∈N+, we show that Floor-ReLU networks with width max{d,5N+13} and depth 64dL+3 can uniformly approximate a Hölder function f on [0,1]d with an approximation error 3λdα/2N-αL, where α∈(0,1] and λ are the Hölder order and constant, respectively. More generally for an arbitrary continuous function f on [0,1]d with a modulus of continuity ωf(·), the constructive approximation rate is ωf(dN-L)+2ωf(d)N-L. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of ωf(r) as r→0 is moderate (e.g., ωf(r)≲rα for Hölder continuous functions), since the major term to be considered in our approximation rate is essentially d times a function of N and L independent of d within the modulus of continuity.more » « less
-
Large-scale deep neural networks are both memory and computation-intensive, thereby posing stringent requirements on the computing platforms. Hardware accelerations of deep neural networks have been extensively investigated. Specific forms of binary neural networks (BNNs) and stochastic computing-based neural networks (SCNNs) are particularly appealing to hardware implementations since they can be implemented almost entirely with binary operations. Despite the obvious advantages in hardware implementation, these approximate computing techniques are questioned by researchers in terms of accuracy and universal applicability. Also it is important to understand the relative pros and cons of SCNNs and BNNs in theory and in actual hardware implementations. In order to address these concerns, in this paper, we prove that the ”ideal” SCNNs and BNNs satisfy the universal approximation property with probability 1 (due to the stochastic behavior), which is a new angle from the original approximation property. The proof is conducted by first proving the property for SCNNs from the strong law of large numbers, and then using SCNNs as a “bridge” to prove for BNNs. Besides the universal approximation property, we also derive an appropriate bound for bit length M in order to provide insights for the actual neural network implementations. Based on the universal approximation property, we further prove that SCNNs and BNNs exhibit the same energy complexity. In other words, they have the same asymptotic energy consumption with the growth of network size. We also provide a detailed analysis of the pros and cons of SCNNs and BNNs for hardware implementations and conclude that SCNNs are more suitable.more » « less
-
We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of k-Winners-Take-All (k-WTA) activation, a C0 discontinuous function that purposely invalidates the neural network model's gradient at densely distributed input data points. The proposed k-WTA activation can be readily used in nearly all existing networks and training methods with no significant overhead. Our proposal is theoretically rationalized. We analyze why the discontinuities in k-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.more » « less
-
Large-scale deep neural networks are both memory and computation-intensive, thereby posing stringent requirements on the computing platforms. Hardware accelerations of deep neural networks have been extensively investigated. Spe- cific forms of binary neural networks (BNNs) and stochastic computing-based neural networks (SCNNs) are particularly appealing to hardware implementations since they can be im- plemented almost entirely with binary operations. Despite the obvious advantages in hardware implementation, these approximate computing techniques are questioned by researchers in terms of accuracy and universal applicability. Also it is important to understand the relative pros and cons of SCNNs and BNNs in theory and in actual hardware im- plementations. In order to address these concerns, in this pa- per we prove that the ”ideal” SCNNs and BNNs satisfy the universal approximation property with probability 1 (due to the stochastic behavior), which is a new angle from the orig- inal approximation property. The proof is conducted by first proving the property for SCNNs from the strong law of large numbers, and then using SCNNs as a “bridge” to prove for BNNs. Besides the universal approximation property, we also derive an appropriate bound for bit length M in order to pro- vide insights for the actual neural network implementations. Based on the universal approximation property, we further prove that SCNNs and BNNs exhibit the same energy com- plexity. In other words, they have the same asymptotic energy consumption with the growth of network size. We also pro- vide a detailed analysis of the pros and cons of SCNNs and BNNs for hardware implementations and conclude that SC- NNs are more suitable.more » « less
An official website of the United States government
