skip to main content


Title: Random Sketching, Clustering, and Short-Term Memory in Spiking Neural Networks
We study input compression in a biologically inspired model of neural computation. We demonstrate that a network consisting of a random projection step (implemented via random synaptic connectivity) followed by a sparsification step (implemented via winner-take-all competition) can reduce well-separated high-dimensional input vectors to well-separated low-dimensional vectors. By augmenting our network with a third module, we can efficiently map each input (along with any small perturbations of the input) to a unique representative neuron, solving a neural clustering problem. Both the size of our network and its processing time, i.e., the time it takes the network to compute the compressed output given a presented input, are independent of the (potentially large) dimension of the input patterns and depend only on the number of distinct inputs that the network must encode and the pairwise relative Hamming distance between these inputs. The first two steps of our construction mirror known biological networks, for example, in the fruit fly olfactory system [Caron et al., 2013; Lin et al., 2014; Dasgupta et al., 2017]. Our analysis helps provide a theoretical understanding of these networks and lay a foundation for how random compression and input memorization may be implemented in biological neural networks. Technically, a contribution in our network design is the implementation of a short-term memory. Our network can be given a desired memory time t_m as an input parameter and satisfies the following with high probability: any pattern presented several times within a time window of t_m rounds will be mapped to a single representative output neuron. However, a pattern not presented for c⋅t_m rounds for some constant c>1 will be "forgotten", and its representative output neuron will be released, to accommodate newly introduced patterns.  more » « less
Award ID(s):
1810758
NSF-PAR ID:
10161881
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
11th Innovations in Theoretical Computer Science (ITCS 2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the task of measuring time with probabilistic threshold gates implemented by bio-inspired spiking neurons. In the model of spiking neural networks, network evolves in discrete rounds, where in each round, neurons fire in pulses in response to a sufficiently high membrane potential. This potential is induced by spikes from neighboring neurons that fired in the previous round, which can have either an excitatory or inhibitory effect. Discovering the underlying mechanisms by which the brain perceives the duration of time is one of the largest open enigma in computational neuroscience. To gain a better algorithmic understanding onto these processes, we introduce the neural timer problem. In this problem, one is given a time parameter t, an input neuron x, and an output neuron y. It is then required to design a minimum sized neural network (measured by the number of auxiliary neurons) in which every spike from x in a given round i, makes the output y fire for the subsequent t consecutive rounds.We first consider a deterministic implementation of a neural timer and show that Θ(logt)(deterministic) threshold gates are both sufficient and necessary. This raised the question of whether randomness can be leveraged to reduce the number of neurons. We answer this question in the affirmative by considering neural timers with spiking neurons where the neuron y is required to fire for t consecutive rounds with probability at least 1−δ, and should stop firing after at most 2 t rounds with probability 1−δ for some input parameter δ∈(0,1). Our key result is a construction of a neural timer with O(log log 1/δ) spiking neurons. Interestingly, this construction uses only one spiking neuron, while the remaining neurons can be deterministic threshold gates. We complement this construction with a matching lower bound of Ω(min{log log 1/δ,logt}) neurons. This provides the first separation between deterministic and randomized constructions in the setting of spiking neural networks.Finally, we demonstrate the usefulness of compressed counting networks for synchronizing neural networks. In the spirit of distributed synchronizers [Awerbuch-Peleg, FOCS’90], we provide a general transformation (or simulation) that can take any synchronized network solution and simulate it in an asynchronous setting (where edges have arbitrary response latencies) while incurring a small overhead w.r.t the number of neurons and computation time. 
    more » « less
  2. null (Ed.)
    Variable binding is a cornerstone of symbolic reasoning and cognition. But how binding can be implemented in connectionist models has puzzled neuroscientists, cognitive psychologists, and neural network researchers for many decades. One type of connectionist model that naturally includes a binding operation is vector symbolic architectures (VSAs). In contrast to other proposals for variable binding, the binding operation in VSAs is dimensionality-preserving, which enables representing complex hierarchical data structures, such as trees, while avoiding a combinatoric expansion of dimensionality. Classical VSAs encode symbols by dense randomized vectors, in which information is distributed throughout the entire neuron population. By contrast, in the brain, features are encoded more locally, by the activity of single neurons or small groups of neurons, often forming sparse vectors of neural activation. Following Laiho et al. (2015), we explore symbolic reasoning with a special case of sparse distributed representations. Using techniques from compressed sensing, we first show that variable binding in classical VSAs is mathematically equivalent to tensor product binding between sparse feature vectors, another well-known binding operation which increases dimensionality. This theoretical result motivates us to study two dimensionality-preserving binding methods that include a reduction of the tensor matrix into a single sparse vector. One binding method for general sparse vectors uses random projections, the other, block-local circular convolution, is defined for sparse vectors with block structure, sparse block-codes. Our experiments reveal that block-local circular convolution binding has ideal properties, whereas random projection based binding also works, but is lossy. We demonstrate in example applications that a VSA with block-local circular convolution and sparse block-codes reaches similar performance as classical VSAs. Finally, we discuss our results in the context of neuroscience and neural networks. 
    more » « less
  3. Obeid, Iyad Selesnick (Ed.)
    Electroencephalography (EEG) is a popular clinical monitoring tool used for diagnosing brain-related disorders such as epilepsy [1]. As monitoring EEGs in a critical-care setting is an expensive and tedious task, there is a great interest in developing real-time EEG monitoring tools to improve patient care quality and efficiency [2]. However, clinicians require automatic seizure detection tools that provide decisions with at least 75% sensitivity and less than 1 false alarm (FA) per 24 hours [3]. Some commercial tools recently claim to reach such performance levels, including the Olympic Brainz Monitor [4] and Persyst 14 [5]. In this abstract, we describe our efforts to transform a high-performance offline seizure detection system [3] into a low latency real-time or online seizure detection system. An overview of the system is shown in Figure 1. The main difference between an online versus offline system is that an online system should always be causal and has minimum latency which is often defined by domain experts. The offline system, shown in Figure 2, uses two phases of deep learning models with postprocessing [3]. The channel-based long short term memory (LSTM) model (Phase 1 or P1) processes linear frequency cepstral coefficients (LFCC) [6] features from each EEG channel separately. We use the hypotheses generated by the P1 model and create additional features that carry information about the detected events and their confidence. The P2 model uses these additional features and the LFCC features to learn the temporal and spatial aspects of the EEG signals using a hybrid convolutional neural network (CNN) and LSTM model. Finally, Phase 3 aggregates the results from both P1 and P2 before applying a final postprocessing step. The online system implements Phase 1 by taking advantage of the Linux piping mechanism, multithreading techniques, and multi-core processors. To convert Phase 1 into an online system, we divide the system into five major modules: signal preprocessor, feature extractor, event decoder, postprocessor, and visualizer. The system reads 0.1-second frames from each EEG channel and sends them to the feature extractor and the visualizer. The feature extractor generates LFCC features in real time from the streaming EEG signal. Next, the system computes seizure and background probabilities using a channel-based LSTM model and applies a postprocessor to aggregate the detected events across channels. The system then displays the EEG signal and the decisions simultaneously using a visualization module. The online system uses C++, Python, TensorFlow, and PyQtGraph in its implementation. The online system accepts streamed EEG data sampled at 250 Hz as input. The system begins processing the EEG signal by applying a TCP montage [8]. Depending on the type of the montage, the EEG signal can have either 22 or 20 channels. To enable the online operation, we send 0.1-second (25 samples) length frames from each channel of the streamed EEG signal to the feature extractor and the visualizer. Feature extraction is performed sequentially on each channel. The signal preprocessor writes the sample frames into two streams to facilitate these modules. In the first stream, the feature extractor receives the signals using stdin. In parallel, as a second stream, the visualizer shares a user-defined file with the signal preprocessor. This user-defined file holds raw signal information as a buffer for the visualizer. The signal preprocessor writes into the file while the visualizer reads from it. Reading and writing into the same file poses a challenge. The visualizer can start reading while the signal preprocessor is writing into it. To resolve this issue, we utilize a file locking mechanism in the signal preprocessor and visualizer. Each of the processes temporarily locks the file, performs its operation, releases the lock, and tries to obtain the lock after a waiting period. The file locking mechanism ensures that only one process can access the file by prohibiting other processes from reading or writing while one process is modifying the file [9]. The feature extractor uses circular buffers to save 0.3 seconds or 75 samples from each channel for extracting 0.2-second or 50-sample long center-aligned windows. The module generates 8 absolute LFCC features where the zeroth cepstral coefficient is replaced by a temporal domain energy term. For extracting the rest of the features, three pipelines are used. The differential energy feature is calculated in a 0.9-second absolute feature window with a frame size of 0.1 seconds. The difference between the maximum and minimum temporal energy terms is calculated in this range. Then, the first derivative or the delta features are calculated using another 0.9-second window. Finally, the second derivative or delta-delta features are calculated using a 0.3-second window [6]. The differential energy for the delta-delta features is not included. In total, we extract 26 features from the raw sample windows which add 1.1 seconds of delay to the system. We used the Temple University Hospital Seizure Database (TUSZ) v1.2.1 for developing the online system [10]. The statistics for this dataset are shown in Table 1. A channel-based LSTM model was trained using the features derived from the train set using the online feature extractor module. A window-based normalization technique was applied to those features. In the offline model, we scale features by normalizing using the maximum absolute value of a channel [11] before applying a sliding window approach. Since the online system has access to a limited amount of data, we normalize based on the observed window. The model uses the feature vectors with a frame size of 1 second and a window size of 7 seconds. We evaluated the model using the offline P1 postprocessor to determine the efficacy of the delayed features and the window-based normalization technique. As shown by the results of experiments 1 and 4 in Table 2, these changes give us a comparable performance to the offline model. The online event decoder module utilizes this trained model for computing probabilities for the seizure and background classes. These posteriors are then postprocessed to remove spurious detections. The online postprocessor receives and saves 8 seconds of class posteriors in a buffer for further processing. It applies multiple heuristic filters (e.g., probability threshold) to make an overall decision by combining events across the channels. These filters evaluate the average confidence, the duration of a seizure, and the channels where the seizures were observed. The postprocessor delivers the label and confidence to the visualizer. The visualizer starts to display the signal as soon as it gets access to the signal file, as shown in Figure 1 using the “Signal File” and “Visualizer” blocks. Once the visualizer receives the label and confidence for the latest epoch from the postprocessor, it overlays the decision and color codes that epoch. The visualizer uses red for seizure with the label SEIZ and green for the background class with the label BCKG. Once the streaming finishes, the system saves three files: a signal file in which the sample frames are saved in the order they were streamed, a time segmented event (TSE) file with the overall decisions and confidences, and a hypotheses (HYP) file that saves the label and confidence for each epoch. The user can plot the signal and decisions using the signal and HYP files with only the visualizer by enabling appropriate options. For comparing the performance of different stages of development, we used the test set of TUSZ v1.2.1 database. It contains 1015 EEG records of varying duration. The any-overlap performance [12] of the overall system shown in Figure 2 is 40.29% sensitivity with 5.77 FAs per 24 hours. For comparison, the previous state-of-the-art model developed on this database performed at 30.71% sensitivity with 6.77 FAs per 24 hours [3]. The individual performances of the deep learning phases are as follows: Phase 1’s (P1) performance is 39.46% sensitivity and 11.62 FAs per 24 hours, and Phase 2 detects seizures with 41.16% sensitivity and 11.69 FAs per 24 hours. We trained an LSTM model with the delayed features and the window-based normalization technique for developing the online system. Using the offline decoder and postprocessor, the model performed at 36.23% sensitivity with 9.52 FAs per 24 hours. The trained model was then evaluated with the online modules. The current performance of the overall online system is 45.80% sensitivity with 28.14 FAs per 24 hours. Table 2 summarizes the performances of these systems. The performance of the online system deviates from the offline P1 model because the online postprocessor fails to combine the events as the seizure probability fluctuates during an event. The modules in the online system add a total of 11.1 seconds of delay for processing each second of the data, as shown in Figure 3. In practice, we also count the time for loading the model and starting the visualizer block. When we consider these facts, the system consumes 15 seconds to display the first hypothesis. The system detects seizure onsets with an average latency of 15 seconds. Implementing an automatic seizure detection model in real time is not trivial. We used a variety of techniques such as the file locking mechanism, multithreading, circular buffers, real-time event decoding, and signal-decision plotting to realize the system. A video demonstrating the system is available at: https://www.isip.piconepress.com/projects/nsf_pfi_tt/resources/videos/realtime_eeg_analysis/v2.5.1/video_2.5.1.mp4. The final conference submission will include a more detailed analysis of the online performance of each module. ACKNOWLEDGMENTS Research reported in this publication was most recently supported by the National Science Foundation Partnership for Innovation award number IIP-1827565 and the Pennsylvania Commonwealth Universal Research Enhancement Program (PA CURE). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the official views of any of these organizations. REFERENCES [1] A. Craik, Y. He, and J. L. Contreras-Vidal, “Deep learning for electroencephalogram (EEG) classification tasks: a review,” J. Neural Eng., vol. 16, no. 3, p. 031001, 2019. https://doi.org/10.1088/1741-2552/ab0ab5. [2] A. C. Bridi, T. Q. Louro, and R. C. L. Da Silva, “Clinical Alarms in intensive care: implications of alarm fatigue for the safety of patients,” Rev. Lat. Am. Enfermagem, vol. 22, no. 6, p. 1034, 2014. https://doi.org/10.1590/0104-1169.3488.2513. [3] M. Golmohammadi, V. Shah, I. Obeid, and J. Picone, “Deep Learning Approaches for Automatic Seizure Detection from Scalp Electroencephalograms,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York, New York, USA: Springer, 2020, pp. 233–274. https://doi.org/10.1007/978-3-030-36844-9_8. [4] “CFM Olympic Brainz Monitor.” [Online]. Available: https://newborncare.natus.com/products-services/newborn-care-products/newborn-brain-injury/cfm-olympic-brainz-monitor. [Accessed: 17-Jul-2020]. [5] M. L. Scheuer, S. B. Wilson, A. Antony, G. Ghearing, A. Urban, and A. I. Bagic, “Seizure Detection: Interreader Agreement and Detection Algorithm Assessments Using a Large Dataset,” J. Clin. Neurophysiol., 2020. https://doi.org/10.1097/WNP.0000000000000709. [6] A. Harati, M. Golmohammadi, S. Lopez, I. Obeid, and J. Picone, “Improved EEG Event Classification Using Differential Energy,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium, 2015, pp. 1–4. https://doi.org/10.1109/SPMB.2015.7405421. [7] V. Shah, C. Campbell, I. Obeid, and J. Picone, “Improved Spatio-Temporal Modeling in Automated Seizure Detection using Channel-Dependent Posteriors,” Neurocomputing, 2021. [8] W. Tatum, A. Husain, S. Benbadis, and P. Kaplan, Handbook of EEG Interpretation. New York City, New York, USA: Demos Medical Publishing, 2007. [9] D. P. Bovet and C. Marco, Understanding the Linux Kernel, 3rd ed. O’Reilly Media, Inc., 2005. https://www.oreilly.com/library/view/understanding-the-linux/0596005652/. [10] V. Shah et al., “The Temple University Hospital Seizure Detection Corpus,” Front. Neuroinform., vol. 12, pp. 1–6, 2018. https://doi.org/10.3389/fninf.2018.00083. [11] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011. https://dl.acm.org/doi/10.5555/1953048.2078195. [12] J. Gotman, D. Flanagan, J. Zhang, and B. Rosenblatt, “Automatic seizure detection in the newborn: Methods and initial evaluation,” Electroencephalogr. Clin. Neurophysiol., vol. 103, no. 3, pp. 356–362, 1997. https://doi.org/10.1016/S0013-4694(97)00003-9. 
    more » « less
  4. Soltani, Alireza (Ed.)
    Feedforward network models performing classification tasks rely on highly convergent output units that collect the information passed on by preceding layers. Although convergent output-unit like neurons may exist in some biological neural circuits, notably the cerebellar cortex, neocortical circuits do not exhibit any obvious candidates for this role; instead they are highly recurrent. We investigate whether a sparsely connected recurrent neural network (RNN) can perform classification in a distributed manner without ever bringing all of the relevant information to a single convergence site. Our model is based on a sparse RNN that performs classification dynamically. Specifically, the interconnections of the RNN are trained to resonantly amplify the magnitude of responses to some external inputs but not others. The amplified and non-amplified responses then form the basis for binary classification. Furthermore, the network acts as an evidence accumulator and maintains its decision even after the input is turned off. Despite highly sparse connectivity, learned recurrent connections allow input information to flow to every neuron of the RNN, providing the basis for distributed computation. In this arrangement, the minimum number of synapses per neuron required to reach maximum memory capacity scales only logarithmically with network size. The model is robust to various types of noise, works with different activation and loss functions and with both backpropagation- and Hebbian-based learning rules. The RNN can also be constructed with a split excitation-inhibition architecture with little reduction in performance. 
    more » « less
  5. Dodis, Y. (Ed.)
    Memory-hard functions (MHFs) are a useful cryptographic primitive which can be used to design egalitarian proof of work puzzles and to protect low entropy secrets like passwords against brute-force attackers. Intuitively, a memory-hard function is a function whose evaluation costs are dominated by memory costs even if the attacker uses specialized hardware (FPGAs/ASICs), and several cost metrics have been proposed to quantify this intuition. For example, space-time cost looks at the product of running time and the maximum space usage over the entire execution of an algorithm. Alwen and Serbinenko (STOC 2015) observed that the space-time cost of evaluating a function multiple times may not scale linearly in the number of instances being evaluated and introduced the stricter requirement that a memory-hard function has high cumulative memory complexity (CMC) to ensure that an attacker’s amortized space-time costs remain large even if the attacker evaluates the function on multiple different inputs in parallel. Alwen et al. (EUROCRYPT 2018) observed that the notion of CMC still gives the attacker undesirable flexibility in selecting space-time tradeoffs e.g., while the MHF Scrypt has maximal CMC Ω(N^2), an attacker could evaluate the function with constant O(1) memory in time O(N^2). Alwen et al. introduced an even stricter notion of Sustained Space complexity and designed an MHF which has s=Ω(N/logN) sustained complexity t=Ω(N) i.e., any algorithm evaluating the function in the parallel random oracle model must have at least t=Ω(N) steps where the memory usage is at least Ω(N/logN). In this work, we use dynamic pebbling games and dynamic graphs to explore tradeoffs between sustained space complexity and cumulative memory complexity for data-dependent memory-hard functions such as Argon2id and Scrypt. We design our own dynamic graph (dMHF) with the property that any dynamic pebbling strategy either (1) has Ω(N) rounds with Ω(N) space, or (2) has CMC Ω(N^{3−ϵ})—substantially larger than N^2. For Argon2id we show that any dynamic pebbling strategy either(1) has Ω(N) rounds with Ω(N^{1−ϵ}) space, or (2) has CMC ω(N^2). We also present a dynamic version of DRSample (Alwen et al. 2017) for which any dynamic pebbling strategy either (1) has Ω(N) rounds with Ω(N/log N) space, or (2) has CMC Ω(N^3/log N). 
    more » « less