skip to main content

Title: Adaptive Geometric Multiscale Approximations for Intrinsically Low-dimensional Data
We consider the problem of efficiently approximating and encoding high-dimensional data sampled from a probability distribution ρ in ℝD, that is nearly supported on a d-dimensional set  - for example supported on a d-dimensional manifold. Geometric Multi-Resolution Analysis (GMRA) provides a robust and computationally efficient procedure to construct low-dimensional geometric approximations of  at varying resolutions. We introduce GMRA approximations that adapt to the unknown regularity of , by introducing a thresholding algorithm on the geometric wavelet coefficients. We show that these data-driven, empirical geometric approximations perform well, when the threshold is chosen as a suitable universal function of the number of samples n, on a large class of measures ρ, that are allowed to exhibit different regularity at different scales and locations, thereby efficiently encoding data from more complex measures than those supported on manifolds. These GMRA approximations are associated to a dictionary, together with a fast transform mapping data to d-dimensional coefficients, and an inverse of such a map, all of which are data-driven. The algorithms for both the dictionary construction and the transforms have complexity CDnlogn with the constant C exponential in d. Our work therefore establishes Adaptive GMRA as a fast dictionary learning algorithm, with approximation guarantees, for intrinsically low-dimensional data. We include several numerical experiments on both synthetic and real data, confirming our theoretical results and demonstrating the effectiveness of Adaptive GMRA.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Journal of machine learning research
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)

    We consider the regression problem of estimating functions on $ \mathbb{R}^D $ but supported on a $ d $-dimensional manifold $ \mathcal{M} ~~\subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $ \mathcal{M} $ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $ d $, instead of an unknown manifold embedded in $ \mathbb{R}^D $. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $ D $ and exponential in $ d $. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees.

    more » « less
  2. We study two-stage stochastic optimization problems with random recourse, where the coefficients of the adaptive decisions involve uncertain parameters. To deal with the infinite-dimensional recourse decisions, we propose a scalable approximation scheme via piecewise linear and piecewise quadratic decision rules. We develop a data-driven distributionally robust framework with two layers of robustness to address distributional uncertainty. We also establish out-of-sample performance guarantees for the proposed scheme. Applying known ideas, the resulting optimization problem can be reformulated as an exact copositive program that admits semidefinite programming approximations. We design an iterative decomposition algorithm, which converges under some regularity conditions, to reduce the runtime needed to solve this program. Through numerical examples for various known operations management applications, we demonstrate that our method produces significantly better solutions than the traditional sample-average approximation scheme especially when the data are limited. For the problem instances for which only the recourse cost coefficients are random, our method exhibits slightly inferior out-of-sample performance but shorter runtimes compared with a competing approach. 
    more » « less
  3. A new class of neuromorphic processors promises to provide fast and power-efficient execution of spiking neural networks with on-chip synaptic plasticity. This efficiency derives in part from the fine-grained parallelism as well as event-driven communication mediated by spatially and temporally sparse spike messages. Another source of efficiency arises from the close spatial proximity between synapses and the sites where their weights are applied and updated. This proximity of compute and memory elements drastically reduces expensive data movements but imposes the constraint that only local operations can be efficiently performed, similar to constraints present in biological neural circuits. Efficient weight update operations should therefore only depend on information available locally at each synapse as non-local operations that involve copying, taking a transpose, or normalizing an entire weight matrix are not efficiently supported by present neuromorphic architectures. Moreover, spikes are typically non-negative events, which imposes additional constraints on how local weight update operations can be performed. The Locally Competitive Algorithm (LCA) is a dynamical sparse solver that uses only local computations between non-spiking leaky integrator neurons, allowing for massively parallel implementations on compatible neuromorphic architectures such as Intel's Loihi research chip. It has been previously demonstrated that non-spiking LCA can be used to learn dictionaries of convolutional kernels in an unsupervised manner from raw, unlabeled input, although only by employing non-local computation and signed non-spiking outputs. Here, we show how unsupervised dictionary learning with spiking LCA (S-LCA) can be implemented using only local computation and unsigned spike events, providing a promising strategy for constructing self-organizing neuromorphic chips. 
    more » « less
  4. Deep generative models have experienced great empirical successes in distribution learning. Many existing experiments have demonstrated that deep generative networks can efficiently generate high-dimensional complex data from a low-dimensional easy-to-sample distribution. However, this phenomenon can not be justified by existing theories. The widely held manifold hypothesis speculates that real-world data sets, such as natural images and signals, exhibit low-dimensional geometric structures. In this paper, we take such low-dimensional data structures into consideration by assuming that data distributions are supported on a low-dimensional manifold. We prove approximation and estimation theories of deep generative networks for estimating distributions on a low-dimensional manifold under the Wasserstein-1 loss. We show that the Wasserstein-1 loss converges to zero at a fast rate depending on the intrinsic dimension instead of the ambient data dimension. Our theory leverages the low-dimensional geometric structures in data sets and justifies the practical power of deep generative models. We require no smoothness assumptions on the data distribution which is desirable in practice. 
    more » « less
  5. In bistable perception, observers experience alternations between two interpretations of an unchanging stimulus. Neurophysiological studies of bistable perception typically partition neural measurements into stimulus-based epochs and assess neuronal differences between epochs based on subjects' perceptual reports. Computational studies replicate statistical properties of percept durations with modeling principles like competitive attractors or Bayesian inference. However, bridging neuro-behavioral findings with modeling theory requires the analysis of single-trial dynamic data. Here, we propose an algorithm for extracting nonstationary timeseries features from single-trial electrocorticography (ECoG) data. We applied the proposed algorithm to 5-min ECoG recordings from human primary auditory cortex obtained during perceptual alternations in an auditory triplet streaming task (six subjects: four male, two female). We report two ensembles of emergent neuronal features in all trial blocks. One ensemble consists of periodic functions that encode a stereotypical response to the stimulus. The other comprises more transient features and encodes dynamics associated with bistable perception at multiple time scales: minutes (within-trial alternations), seconds (duration of individual percepts), and milliseconds (switches between percepts). Within the second ensemble, we identified a slowly drifting rhythm that correlates with the perceptual states and several oscillators with phase shifts near perceptual switches. Projections of single-trial ECoG data onto these features establish low-dimensional attractor-like geometric structures invariant across subjects and stimulus types. These findings provide supporting neural evidence for computational models with oscillatory-driven attractor-based principles. The feature extraction techniques described here generalize across recording modality and are appropriate when hypothesized low-dimensional dynamics characterize an underlying neural system.

    SIGNIFICANCE STATEMENTIrrespective of the sensory modality, neurophysiological studies of multistable perception have typically investigated events time-locked to the perceptual switching rather than the time course of the perceptual states per se. Here, we propose an algorithm that extracts neuronal features of bistable auditory perception from largescale single-trial data while remaining agnostic to the subject's perceptual reports. The algorithm captures the dynamics of perception at multiple timescales, minutes (within-trial alternations), seconds (durations of individual percepts), and milliseconds (timing of switches), and distinguishes attributes of neural encoding of the stimulus from those encoding the perceptual states. Finally, our analysis identifies a set of latent variables that exhibit alternating dynamics along a low-dimensional manifold, similar to trajectories in attractor-based models for perceptual bistability.

    more » « less