skip to main content


Title: Motivating Bilevel Approaches To Filter Learning: A Case Study
The recent trend in regularization methods for inverse problems is to replace handcrafted sparsifying operators with datadriven approaches. Although using such machine learning techniques often improves image reconstruction methods, the results can depend significantly on the learning methodology. This paper compares two supervised learning methods. First, the paper considers a transform learning approach and, to learn the transform, introduces a variant on the Procrustes method for wide matrices with orthogonal rows. Second, we consider a bilevel convolutional filter learning approach. Numerical experiments show the learned transform performs worse for denoising than both the handcrafted finite difference transform and the learned filters, which perform similarly. Our results motivate the use of bilevel learning.  more » « less
Award ID(s):
1838179
PAR ID:
10309915
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2021 IEEE International Conference on Image Processing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Most of the existing vision-based displacement measurement methods require manual speckles or targets to improve the measurement performance in non-stationary imagery environments. To minimize the use of manual speckles and targets, feature points regarded as virtual markers can be utilized for non-target measurement. In this study, an advanced feature matching strategy is presented, which replaces the handcrafted descriptors with learned descriptors called Visual Geometry Group, of the University of Oxford descriptors to achieve better performance. The feasibility and performance of the proposed method is verified by comparative studies with a laboratory experiment on a two-span bridge model and then with a field application on a railway bridge. The proposed approach of integrated use of Scale Invariant Feature Transform and Visual Geometry Group improved the measurement accuracy by about 24% when compared with the commonly used existing feature matching-based displacement measurement method using Scale Invariant Feature Transform feature and descriptor.

     
    more » « less
  2. Solving a bilevel optimization problem is at the core of several machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, and training-data poisoning. Different from simultaneous or multi-objective optimization, the steepest descent direction for minimizing the upper-level cost in a bilevel problem requires the inverse of the Hessian of the lower-level cost. In this work, we propose a novel algorithm for solving bilevel optimization problems based on the classical penalty function approach. Our method avoids computing the Hessian inverse and can handle constrained bilevel problems easily. We prove the convergence of the method under mild conditions and show that the exact hypergradient is obtained asymptotically. Our method's simplicity and small space and time complexities enable us to effectively solve large-scale bilevel problems involving deep neural networks. We present results on data denoising, few-shot learning, and training-data poisoning problems in a large-scale setting. Our results show that our approach outperforms or is comparable to previously proposed methods based on automatic differentiation and approximate inversion in terms of accuracy, run-time, and convergence speed. 
    more » « less
  3. Active learning is a promising paradigm to reduce the labeling cost by strategically requesting labels to improve model performance. However, existing active learning methods often rely on expensive acquisition function to compute, extensive modeling retraining and multiple rounds of interaction with annotators. To address these limitations, we propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition. A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function’s input, grows over time. Our novel algorithmic contribution is a multi-task bilevel optimization framework that predicts the relative utility, measured by the validation accuracy, of different training sets, and ensures the learned acquisition function generalizes effectively. For cases where validation accuracy is expensive to evaluate, we introduce efficient interpolation-based surrogate models to estimate the utility function, reducing the evaluation cost. We demonstrate the performance of our approach through extensive experiments on standard active classification benchmarks. 
    more » « less
  4. Coreset is a small set that provides a data summary for a large dataset, such that training solely on the small set achieves competitive performance compared with a large dataset. In rehearsal-based continual learning, the coreset is typically used in the memory replay buffer to stand for representative samples in previous tasks, and the coreset selection procedure is typically formulated as a bilevel problem. However, the typical bilevel formulation for coreset selection explicitly performs optimization over discrete decision variables with greedy search, which is computationally expensive. Several works consider other formulations to address this issue, but they ignore the nested nature of bilevel optimization problems and may not solve the bilevel coreset selection problem accurately. To address these issues, we propose a new bilevel formulation, where the inner problem tries to find a model which minimizes the expected training error sampled from a given probability distribution, and the outer problem aims to learn the probability distribution with approximately $K$ (coreset size) nonzero entries such that learned model in the inner problem minimizes the training error over the whole data. To ensure the learned probability has approximately $K$ nonzero entries, we introduce a novel regularizer based on the smoothed top-$K$ loss in the upper problem. We design a new optimization algorithm that provably converges to the $\epsilon$-stationary point with $O(1/\epsilon^4)$ computational complexity. We conduct extensive experiments in various settings in continual learning, including balanced data, imbalanced data, and label noise, to show that our proposed formulation and new algorithm significantly outperform competitive baselines. From bilevel optimization point of view, our algorithm significantly improves the vanilla greedy coreset selection method in terms of running time on continual learning benchmark datasets. The code is available at \url{https://github.com/MingruiLiu-ML-Lab/Bilevel-Coreset-Selection-via-Regularization}. 
    more » « less
  5. In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech features for KWS whenever the number of filterbank channels is severely decreased. Reducing the number of channels might yield certain KWS performance drop, but also a substantial energy consumption reduction, which is key when deploying common always-on KWS on low-resource devices. Experimental results on a noisy version of the Google Speech Commands Dataset show that filterbank learning adapts to noise characteristics to provide a higher degree of robustness to noise, especially when dropout is integrated. Thus, switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5% while simultaneously achieving a 6.3× energy consumption reduction. 
    more » « less