skip to main content


Title: Learning low-dimensional generalizable natural features from retina using a U-net
Much of sensory neuroscience focuses on sensory features that are chosen by the experimenter because they are thought to be behaviorally relevant to the organism. However, it is not generally known what these features are in complex, natural scenes. This work focuses on using the retinal encoding of natural movies to determine the presumably behaviorally-relevant features that the brain represents. It is prohibitive to parameterize a natural movie and its respective retinal encoding fully. We use time within a natural movie as a proxy for the whole suite of features evolving across the scene. We then use a task-agnostic deep architecture, an encoder-decoder, to model the retinal encoding process and characterize its representation of``time in the natural scene''in a compressed latent space. In our end-to-end training, an encoder learns a compressed latent representation from a large population of salamander retinal ganglion cells responding to natural movies, while a decoder samples from this compressed latent space to generate the appropriate movie frame. By comparing latent representations of retinal activity from three movies, we find that the retina performs transfer learning to encode time: the precise, low-dimensional representation of time learned from one movie can be used to represent time in a different movie, with up to 17ms resolution. We then show that static textures and velocity features of a natural movie are synergistic. The retina simultaneously encodes both to establishes a generalizable, low-dimensional representation of time in the natural scene.  more » « less
Award ID(s):
1734030
NSF-PAR ID:
10431460
Author(s) / Creator(s):
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
35
ISSN:
1049-5258
Page Range / eLocation ID:
11355-11368
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a Closed-loop Transcriptionbetween a multi-class, multi-dimensional data distribution and a Linear discriminative representation (CTRL) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as a two-player minimax game between the encoder and decoderfor the learned representation. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing of approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and arguably better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so-learned features of the multiple classes are structured instead of hidden: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space, and diverse visual attributes within each class are modeled by the independent principal components within each subspace. 
    more » « less
  2. Abstract Decoding sensory stimuli from neural activity can provide insight into how the nervous system might interpret the physical environment, and facilitates the development of brain-machine interfaces. Nevertheless, the neural decoding problem remains a significant open challenge. Here, we present an efficient nonlinear decoding approach for inferring natural scene stimuli from the spiking activities of retinal ganglion cells (RGCs). Our approach uses neural networks to improve on existing decoders in both accuracy and scalability. Trained and validated on real retinal spike data from more than 1000 simultaneously recorded macaque RGC units, the decoder demonstrates the necessity of nonlinear computations for accurate decoding of the fine structures of visual stimuli. Specifically, high-pass spatial features of natural images can only be decoded using nonlinear techniques, while low-pass features can be extracted equally well by linear and nonlinear methods. Together, these results advance the state of the art in decoding natural stimuli from large populations of neurons. 
    more » « less
  3. The macaque middle temporal (MT) area is well known for its visual motion selectivity and relevance to motion perception, but the possibility of it also reflecting higher-level cognitive functions has largely been ignored. We tested for effects of task performance distinct from sensory encoding by manipulating subjects' temporal evidence-weighting strategy during a direction discrimination task while performing electrophysiological recordings from groups of MT neurons in rhesus macaques (one male, one female). This revealed multiple components of MT responses that were, surprisingly, not interpretable as behaviorally relevant modulations of motion encoding, or as bottom-up consequences of the readout of motion direction from MT. The time-varying motion-driven responses of MT were strongly affected by our strategic manipulation—but with time courses opposite the subjects' temporal weighting strategies. Furthermore, large choice-correlated signals were represented in population activity distinct from its motion responses, with multiple phases that lagged psychophysical readout and even continued after the stimulus (but which preceded motor responses). In summary, a novel experimental manipulation of strategy allowed us to control the time course of readout to challenge the correlation between sensory responses and choices, and population-level analyses of simultaneously recorded ensembles allowed us to identify strong signals that were so distinct from direction encoding that conventional, single-neuron-centric analyses could not have revealed or properly characterized them. Together, these approaches revealed multiple cognitive contributions to MT responses that are task related but not functionally relevant to encoding or decoding of motion for psychophysical direction discrimination, providing a new perspective on the assumed status of MT as a simple sensory area.

    SIGNIFICANCE STATEMENTThis study extends understanding of the middle temporal (MT) area beyond its representation of visual motion. Combining multineuron recordings, population-level analyses, and controlled manipulation of task strategy, we exposed signals that depended on changes in temporal weighting strategy, but did not manifest as feedforward effects on behavior. This was demonstrated by (1) an inverse relationship between temporal dynamics of behavioral readout and sensory encoding, (2) a choice-correlated signal that always lagged the stimulus time points most correlated with decisions, and (3) a distinct choice-correlated signal after the stimulus. These findings invite re-evaluation of MT for functions outside of its established sensory role and highlight the power of experimenter-controlled changes in temporal strategy, coupled with recording and analysis approaches that transcend the single-neuron perspective.

     
    more » « less
  4. Sentiment Analysis is a popular text classification task in natural language processing. It involves developing algorithms or machine learning models to determine the sentiment or opinion expressed in a piece of text. The results of this task can be used by business owners and product developers to understand their consumers’ perceptions of their products. Asides from customer feedback and product/service analysis, this task can be useful for social media monitoring (Martin et al., 2021). One of the popular applications of sentiment analysis is for classifying and detecting the positive and negative sentiments on movie reviews. Movie reviews enable movie producers to monitor the performances of their movies (Abhishek et al., 2020) and enhance the decision of movie viewers to know whether a movie is good enough and worth investing time to watch (Lakshmi Devi et al., 2020). However, the task has been under-explored for African languages compared to their western counterparts, ”high resource languages”, that are privileged to have received enormous attention due to the large amount of available textual data. African languages fall under the category of the low resource languages which are on the disadvantaged end because of the limited availability of data that gives them a poor representation (Nasim & Ghani, 2020). Recently, sentiment analysis has received attention on African languages in the Twitter domain for Nigerian (Muhammad et al., 2022) and Amharic (Yimam et al., 2020) languages. However, there is no available corpus in the movie domain. We decided to tackle the problem of unavailability of Yoru`ba´ data for movie sentiment analysis by creating the first Yoru`ba´ sentiment corpus for Nollywood movie reviews. Also, we develop sentiment classification models using state-of-the-art pre-trained language models like mBERT (Devlin et al., 2019) and AfriBERTa (Ogueji et al., 2021). 
    more » « less
  5. Learning-based image/video codecs typically utilizethe well known auto-encoder structure where the encoder trans-forms input data to a low-dimensional latent representation.Efficient latent encoding can reduce bandwidth needs duringcompression for transmission and storage. In this paper, weexamine the effect of assigning high level coarse grouping labelsto each latent vector. Designing coding profiles for each latentgroup can achieve high compression encoding. We show thatsuch grouping can be learned via end-to-end optimization of thecodec and the deep learning (DL) model to optimize rate-accuracyfor a given data set. For cloud-based inference, source encodercan select a coding profile based on its learned grouping andencode the data features accordingly. Our test results on imageclassification show that significant performance improvementcan be achieved with learned grouping over its non-groupingcounterpart. 
    more » « less