skip to main content


Title: Similarity Learning and Generalization with Limited Data: A Reservoir Computing Approach
We investigate the ways in which a machine learning architecture known as Reservoir Computing learns concepts such as “similar” and “different” and other relationships between image pairs and generalizes these concepts to previously unseen classes of data. We present two Reservoir Computing architectures, which loosely resemble neural dynamics, and show that a Reservoir Computer (RC) trained to identify relationships between image pairs drawn from a subset of training classes generalizes the learned relationships to substantially different classes unseen during training. We demonstrate our results on the simple MNIST handwritten digit database as well as a database of depth maps of visual scenes in videos taken from a moving camera. We consider image pair relationships such as images from the same class; images from the same class with one image superposed with noise, rotated 90°, blurred, or scaled; images from different classes. We observe that the reservoir acts as a nonlinear filter projecting the input into a higher dimensional space in which the relationships are separable; i.e., the reservoir system state trajectories display different dynamical patterns that reflect the corresponding input pair relationships. Thus, as opposed to training in the entire high-dimensional reservoir space, the RC only needs to learns characteristic features of these dynamical patterns, allowing it to perform well with very few training examples compared with conventional machine learning feed-forward techniques such as deep learning. In generalization tasks, we observe that RCs perform significantly better than state-of-the-art, feed-forward, pair-based architectures such as convolutional and deep Siamese Neural Networks (SNNs). We also show that RCs can not only generalize relationships, but also generalize combinations of relationships, providing robust and effective image pair classification. Our work helps bridge the gap between explainable machine learning with small datasets and biologically inspired analogy-based learning, pointing to new directions in the investigation of learning processes.  more » « less
Award ID(s):
1632976
PAR ID:
10287449
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Complexity
Volume:
2018
ISSN:
1076-2787
Page Range / eLocation ID:
1 to 15
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Puyol Anton, E ; Pop, M ; Sermesant, M ; Campello, V ; Lalande, A ; Lekadir, K ; Suinesiaputra, A ; Camara, O ; Young, A (Ed.)
    Cardiac cine magnetic resonance imaging (CMRI) is the reference standard for assessing cardiac structure as well as function. However, CMRI data presents large variations among different centers, vendors, and patients with various cardiovascular diseases. Since typical deep-learning-based segmentation methods are usually trained using a limited number of ground truth annotations, they may not generalize well to unseen MR images, due to the variations between the training and testing data. In this study, we proposed an approach towards building a generalizable deep-learning-based model for cardiac structure segmentations from multi-vendor,multi-center and multi-diseases CMRI data. We used a novel combination of image augmentation and a consistency loss function to improve model robustness to typical variations in CMRI data. The proposed image augmentation strategy leverages un-labeled data by a) using CycleGAN to generate images in different styles and b) exchanging the low-frequency features of images from different vendors. Our model architecture was based on an attention-gated U-Net model that learns to focus on cardiac structures of varying shapes and sizes while suppressing irrelevant regions. The proposed augmentation and consistency training method demonstrated improved performance on CMRI images from new vendors and centers. When evaluated using CMRI data from 4 vendors and 6 clinical center, our method was generally able to produce accurate segmentations of cardiac structures. 
    more » « less
  2. Zero-shot learning (ZSL) for image classification focuses on recognizing novel categories that have no labeled data available for training. The learning is generally carried out with the help of mid-level semantic descriptors associated with each class. This semantic-descriptor space is generally shared by both seen and unseen categories. However, ZSL suffers from hubness, domain discrepancy and biased-ness towards seen classes. To tackle these problems, we propose a three-step approach to zero-shot learning. Firstly, a mapping is learned from the semantic-descriptor space to the image- feature space. This mapping learns to minimize both one-to- one and pairwise distances between semantic embeddings and the image features of the corresponding classes. Secondly, we propose test-time domain adaptation to adapt the semantic embedding of the unseen classes to the test data. This is achieved by finding correspondences between the semantic descriptors and the image features. Thirdly, we propose scaled calibration on the classification scores of the seen classes. This is necessary because the ZSL model is biased towards seen classes as the unseen classes are not used in the training. Finally, to validate the proposed three-step approach, we performed experiments on four benchmark datasets where the proposed method outperformed previous results. We also studied and analyzed the performance of each component of our proposed ZSL framework. 
    more » « less
  3. We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. 
    more » « less
  4. We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our model automatically understands the underlying task and performs the same task on a new query image following the text guidance. To achieve this, we propose a vision-language prompt that can model a wide range of vision-language tasks and a diffusion model that takes it as input. The diffusion model is trained jointly on six different tasks using these prompts. The resulting Prompt Diffusion model becomes the first diffusion-based vision-language foundation model capable of in-context learning. It demonstrates high-quality in-context generation for the trained tasks and effectively generalizes to new, unseen vision tasks using their respective prompts. Our model also shows compelling text-guided image editing results. Our framework aims to facilitate research into in-context learning for computer vision. We share our code and pre-trained models at https://github. com/Zhendong-Wang/Prompt-Diffusion. 
    more » « less
  5. Two elementary models of ocean circulation, the well-known double-gyre stream function model and a single-layer quasi-geostrophic (QG) basin model, are used to generate flow data that sample a range of possible dynamical behavior for particular flow parameters. A reservoir computing (RC) machine learning algorithm then learns these models from the stream function time series. In the case of the QG model, a system of partial differential equations with three physically relevant dimensionless parameters is solved, including Munk- and Stommel-type solutions. The effectiveness of a RC approach to learning these ocean circulation models is evident from its ability to capture the characteristics of these ocean circulation models with limited data including predictive forecasts. Further assessment of the accuracy and usefulness of the RC approach is conducted by evaluating the role of both physical and numerical parameters and by comparison with particle trajectories and with well-established quantitative assessments, including finite-time Lyapunov exponents and proper orthogonal decomposition. The results show the capability of the methods outlined in this article to be applied to key research problems on ocean transport, such as predictive modeling or control. 
    more » « less