skip to main content


Title: Overparameterized neural networks implement associative memory
Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience. Our main finding is that standard overparameterized deep neural networks trained using standard optimization methods implement such a mechanism for real-valued data. We provide empirical evidence that 1) overparameterized autoencoders store training samples as attractors and thus iterating the learned map leads to sample recovery, and that 2) the same mechanism allows for encoding sequences of examples and serves as an even more efficient mechanism for memory than autoencoding. Theoretically, we prove that when trained on a single example, autoencoders store the example as an attractor. Lastly, by treating a sequence encoder as a composition of maps, we prove that sequence encoding provides a more efficient mechanism for memory than autoencoding.  more » « less
Award ID(s):
1651995 2050360
NSF-PAR ID:
10232184
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
117
Issue:
44
ISSN:
0027-8424
Page Range / eLocation ID:
27162 to 27170
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Processing large amounts of data, especially in learning algorithms, poses a challenge for current embedded computing systems. Hyperdimensional (HD) computing (HDC) is a brain-inspired computing paradigm that works with high-dimensional vectors called hypervectors . HDC replaces several complex learning computations with bitwise and simpler arithmetic operations at the expense of an increased amount of data due to mapping the data into high-dimensional space. These hypervectors, more often than not, cannot be stored in memory, resulting in long data transfers from storage. In this article, we propose Store-n-Learn, an in-storage computing solution that performs HDC classification and clustering by implementing encoding, training, retraining, and inference across the flash hierarchy. To hide the latency of training and enable efficient computation, we introduce the concept of batching in HDC. We also present on-chip acceleration for HDC encoding in flash planes. This enables us to exploit the high parallelism provided by the flash hierarchy and encode multiple data points in parallel in both batched and non-batched fashion. Store-n-Learn also implements a single top-level FPGA accelerator with novel implementations for HDC classification training, retraining, inference, and clustering on the encoded data. Our evaluation over 10 popular datasets shows that Store-n-Learn is on average 222× (543×) faster than CPU and 10.6× (7.3×) faster than the state-of-the-art in-storage computing solution, INSIDER for HDC classification (clustering). 
    more » « less
  2. Abstract Traditionally, neuroscience and psychology have studied the human brain during periods of “online” attention to the environment, while participants actively engage in processing sensory stimuli. However, emerging evidence shows that the waking brain also intermittently enters an “offline” state, during which sensory processing is inhibited and our attention shifts inward. In fact, humans may spend up to half of their waking hours offline [Wamsley, E. J., & Summer, T. Spontaneous entry into an “offline” state during wakefulness: A mechanism of memory consolidation? Journal of Cognitive Neuroscience, 32, 1714–1734, 2020; Killingsworth, M. A., & Gilbert, D. T. A wandering mind is an unhappy mind. Science, 330, 932, 2010]. The function of alternating between online and offline forms of wakefulness remains unknown. We hypothesized that rapidly switching between online and offline states enables the brain to alternate between the competing demands of encoding new information and consolidating already-encoded information. A total of 46 participants (34 female) trained on a memory task just before a 30-min retention interval, during which they completed a simple attention task while undergoing simultaneous high-density EEG and pupillometry recording. We used a data-driven method to parse this retention interval into a sequence of discrete online and offline states, with a 5-sec temporal resolution. We found evidence for three distinct states, one of which was an offline state with features well-suited to support memory consolidation, including increased EEG slow oscillation power, reduced attention to the external environment, and increased pupil diameter (a proxy for increased norepinephrine). Participants who spent more time in this offline state following encoding showed improved memory at delayed test. These observations are consistent with the hypothesis that even brief, seconds-long entry into an offline state may support the early stages of memory consolidation. 
    more » « less
  3. BACKGROUND Optical sensing devices measure the rich physical properties of an incident light beam, such as its power, polarization state, spectrum, and intensity distribution. Most conventional sensors, such as power meters, polarimeters, spectrometers, and cameras, are monofunctional and bulky. For example, classical Fourier-transform infrared spectrometers and polarimeters, which characterize the optical spectrum in the infrared and the polarization state of light, respectively, can occupy a considerable portion of an optical table. Over the past decade, the development of integrated sensing solutions by using miniaturized devices together with advanced machine-learning algorithms has accelerated rapidly, and optical sensing research has evolved into a highly interdisciplinary field that encompasses devices and materials engineering, condensed matter physics, and machine learning. To this end, future optical sensing technologies will benefit from innovations in device architecture, discoveries of new quantum materials, demonstrations of previously uncharacterized optical and optoelectronic phenomena, and rapid advances in the development of tailored machine-learning algorithms. ADVANCES Recently, a number of sensing and imaging demonstrations have emerged that differ substantially from conventional sensing schemes in the way that optical information is detected. A typical example is computational spectroscopy. In this new paradigm, a compact spectrometer first collectively captures the comprehensive spectral information of an incident light beam using multiple elements or a single element under different operational states and generates a high-dimensional photoresponse vector. An advanced algorithm then interprets the vector to achieve reconstruction of the spectrum. This scheme shifts the physical complexity of conventional grating- or interference-based spectrometers to computation. Moreover, many of the recent developments go well beyond optical spectroscopy, and we discuss them within a common framework, dubbed “geometric deep optical sensing.” The term “geometric” is intended to emphasize that in this sensing scheme, the physical properties of an unknown light beam and the corresponding photoresponses can be regarded as points in two respective high-dimensional vector spaces and that the sensing process can be considered to be a mapping from one vector space to the other. The mapping can be linear, nonlinear, or highly entangled; for the latter two cases, deep artificial neural networks represent a natural choice for the encoding and/or decoding processes, from which the term “deep” is derived. In addition to this classical geometric view, the quantum geometry of Bloch electrons in Hilbert space, such as Berry curvature and quantum metrics, is essential for the determination of the polarization-dependent photoresponses in some optical sensors. In this Review, we first present a general perspective of this sensing scheme from the viewpoint of information theory, in which the photoresponse measurement and the extraction of light properties are deemed as information-encoding and -decoding processes, respectively. We then discuss demonstrations in which a reconfigurable sensor (or an array thereof), enabled by device reconfigurability and the implementation of neural networks, can detect the power, polarization state, wavelength, and spatial features of an incident light beam. OUTLOOK As increasingly more computing resources become available, optical sensing is becoming more computational, with device reconfigurability playing a key role. On the one hand, advanced algorithms, including deep neural networks, will enable effective decoding of high-dimensional photoresponse vectors, which reduces the physical complexity of sensors. Therefore, it will be important to integrate memory cells near or within sensors to enable efficient processing and interpretation of a large amount of photoresponse data. On the other hand, analog computation based on neural networks can be performed with an array of reconfigurable devices, which enables direct multiplexing of sensing and computing functions. We anticipate that these two directions will become the engineering frontier of future deep sensing research. On the scientific frontier, exploring quantum geometric and topological properties of new quantum materials in both linear and nonlinear light-matter interactions will enrich the information-encoding pathways for deep optical sensing. In addition, deep sensing schemes will continue to benefit from the latest developments in machine learning. Future highly compact, multifunctional, reconfigurable, and intelligent sensors and imagers will find applications in medical imaging, environmental monitoring, infrared astronomy, and many other areas of our daily lives, especially in the mobile domain and the internet of things. Schematic of deep optical sensing. The n -dimensional unknown information ( w ) is encoded into an m -dimensional photoresponse vector ( x ) by a reconfigurable sensor (or an array thereof), from which w′ is reconstructed by a trained neural network ( n ′ = n and w′   ≈   w ). Alternatively, x may be directly deciphered to capture certain properties of w . Here, w , x , and w′ can be regarded as points in their respective high-dimensional vector spaces ℛ n , ℛ m , and ℛ n ′ . 
    more » « less
  4. Today's high-performance computing (HPC) applications are producing vast volumes of data, which are challenging to store and transfer efficiently during the execution, such that data compression is becoming a critical technique to mitigate the storage burden and data movement cost. Huffman coding is arguably the most efficient Entropy coding algorithm in information theory, such that it could be found as a fundamental step in many modern compression algorithms such as DEFLATE. On the other hand, today's HPC applications are more and more relying on the accelerators such as GPU on supercomputers, while Huffman encoding suffers from low throughput on GPUs, resulting in a significant bottleneck in the entire data processing. In this paper, we propose and implement an efficient Huffman encoding approach based on modern GPU architectures, which addresses two key challenges: (1) how to parallelize the entire Huffman encoding algorithm, including codebook construction, and (2) how to fully utilize the high memory-bandwidth feature of modern GPU architectures. The detailed contribution is four-fold. (1) We develop an efficient parallel codebook construction on GPUs that scales effectively with the number of input symbols. (2) We propose a novel reduction based encoding scheme that can efficiently merge the codewords on GPUs. (3) We optimize the overall GPU performance by leveraging the state-of-the-art CUDA APIs such as Cooperative Groups. (4) We evaluate our Huffman encoder thoroughly using six real-world application datasets on two advanced GPUs and compare with our implemented multi-threaded Huffman encoder. Experiments show that our solution can improve the encoding throughput by up to 5.0x and 6.8x on NVIDIA RTX 5000 and V100, respectively, over the state-of-the-art GPU Huffman encoder, and by up to 3.3x over the multi-thread encoder on two 28-core Xeon Platinum 8280 CPUs. 
    more » « less
  5. Abstract Motivation

    The size of a genome graph—the space required to store the nodes, node labels and edges—affects the efficiency of operations performed on it. For example, the time complexity to align a sequence to a graph without a graph index depends on the total number of characters in the node labels and the number of edges in the graph. This raises the need for approaches to construct space-efficient genome graphs.

    Results

    We point out similarities in the string encoding mechanisms of genome graphs and the external pointer macro (EPM) compression model. We present a pair of linear-time algorithms that transform between genome graphs and EPM-compressed forms. The algorithms result in an upper bound on the size of the genome graph constructed in terms of an optimal EPM compression. To further reduce the size of the genome graph, we propose the source assignment problem that optimizes over the equivalent choices during compression and introduce an ILP formulation that solves that problem optimally. As a proof-of-concept, we introduce RLZ-Graph, a genome graph constructed based on the relative Lempel–Ziv algorithm. Using RLZ-Graph, across all human chromosomes, we are able to reduce the disk space to store a genome graph on average by 40.7% compared to colored compacted de Bruijn graphs constructed by Bifrost under the default settings. The RLZ-Graph scales well in terms of running time and graph sizes with an increasing number of human genome sequences compared to Bifrost and variation graphs produced by VGtoolkit.

    Availability

    The RLZ-Graph software is available at: https://github.com/Kingsford-Group/rlzgraph.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less