skip to main content


Title: Motivating Bilevel Approaches To Filter Learning: A Case Study
The recent trend in regularization methods for inverse problems is to replace handcrafted sparsifying operators with datadriven approaches. Although using such machine learning techniques often improves image reconstruction methods, the results can depend significantly on the learning methodology. This paper compares two supervised learning methods. First, the paper considers a transform learning approach and, to learn the transform, introduces a variant on the Procrustes method for wide matrices with orthogonal rows. Second, we consider a bilevel convolutional filter learning approach. Numerical experiments show the learned transform performs worse for denoising than both the handcrafted finite difference transform and the learned filters, which perform similarly. Our results motivate the use of bilevel learning.  more » « less
Award ID(s):
1838179
NSF-PAR ID:
10309915
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2021 IEEE International Conference on Image Processing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Neural Architecture Search (NAS) is a popular method for automatically designing optimized architectures for high-performance deep learning. In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (lower-level problem) and various hyperparameters such as the configuration of the architecture over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of the role of train-validation split. To this aim this work establishes the following results: • We show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss. This reveals that the upper-level problem helps select the most generalizable model and prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous spaces – which are highly relevant for popular differentiable search schemes. • We establish generalization bounds for NAS problems with an emphasis on an activation search problem. When optimized with gradient-descent, we show that the train-validation procedure returns the best (model, architecture) pair even if all architectures can perfectly fit the training data to achieve zero error. • Finally, we highlight rigorous connections between NAS, multiple kernel learning, and low-rank matrix learning. The latter leads to novel algorithmic insights where the solution of the upper problem can be accurately learned via efficient spectral methods to achieve near-minimal risk. 
    more » « less
  2. Neural Architecture Search (NAS) is a popular method for automatically designing optimized architectures for high-performance deep learning. In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (lower-level problem) and various hyperparameters such as the configuration of the architecture over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of the role of train-validation split. To this aim this work establishes the following results: • We show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss. This reveals that the upper-level problem helps select the most generalizable model and prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous spaces – which are highly relevant for popular differentiable search schemes. • We establish generalization bounds for NAS problems with an emphasis on an activation search problem. When optimized with gradient-descent, we show that the train-validation procedure returns the best (model, architecture) pair even if all architectures can perfectly fit the training data to achieve zero error. • Finally, we highlight rigorous connections between NAS, multiple kernel learning, and low-rank matrix learning. The latter leads to novel algorithmic insights where the solution of the upper problem can be accurately learned via efficient spectral methods to achieve near-minimal risk. 
    more » « less
  3. ABSTRACT

    We introduce a new class of iterative image reconstruction algorithms for radio interferometry, at the interface of convex optimization and deep learning, inspired by plug-and-play methods. The approach consists in learning a prior image model by training a deep neural network (DNN) as a denoiser, and substituting it for the handcrafted proximal regularization operator of an optimization algorithm. The proposed AIRI (‘AI for Regularization in radio-interferometric Imaging’) framework, for imaging complex intensity structure with diffuse and faint emission from visibility data, inherits the robustness and interpretability of optimization, and the learning power and speed of networks. Our approach relies on three steps. First, we design a low dynamic range training data base from optical intensity images. Secondly, we train a DNN denoiser at a noise level inferred from the signal-to-noise ratio of the data. We use training losses enhanced with a non-expansiveness term ensuring algorithm convergence, and including on-the-fly data base dynamic range enhancement via exponentiation. Thirdly, we plug the learned denoiser into the forward–backward optimization algorithm, resulting in a simple iterative structure alternating a denoising step with a gradient-descent data-fidelity step. We have validated AIRI against clean, optimization algorithms of the SARA family, and a DNN trained to reconstruct the image directly from visibility data. Simulation results show that AIRI is competitive in imaging quality with SARA and its unconstrained forward–backward-based version uSARA, while providing significant acceleration. clean remains faster but offers lower quality. The end-to-end DNN offers further acceleration, but with far lower quality than AIRI.

     
    more » « less
  4. Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or “generalizable learning of optimizers”); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or “learning to generalize”). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. 
    more » « less
  5. null (Ed.)
    Many university engineering programs require their students to complete a senior capstone experience to equip them with the knowledge and skills they need to succeed after graduation. Such capstone experiences typically integrate knowledge and skills learned cumulatively in the degree program, often engaging students in projects outside of the classroom. As part of an initiative to completely transform the civil engineering undergraduate program at Clemson University, a capstone-like course sequence is being incorporated into the curriculum during the sophomore year. Funded by a grant from the National Science Foundation’s Revolutionizing Engineering Departments (RED) program, this departmental transformation (referred to as the Arch initiative) is aiming to develop a culture of adaptation and a curriculum support for inclusive excellence and innovation to address the complex challenges faced by our society. Just as springers serve as the foundation stones of an arch, the new courses are called “Springers” because they serve as the foundations of the transformed curriculum. The goal of the Springer course sequence is to expose students to the “big picture” of civil engineering while developing student skills in professionalism, communication, and teamwork through real-world projects and hands-on activities. The expectation is that the Springer course sequence will allow faculty to better engage students at the beginning of their studies and help them understand how future courses contribute to the overall learning outcomes of a degree in civil engineering. The Springer course sequence is team-taught by faculty from both civil engineering and communication, and exposes students to all of the civil engineering subdisciplines. Through a project-based learning approach, Springer courses mimic capstone in that students work on a practical application of civil engineering concepts throughout the semester in a way that challenges students to incorporate tools that they will build on and use during their junior and senior years. In the 2019 spring semester, a pilot of the first of the Springer courses (Springer 1; n=11) introduced students to three civil engineering subdisciplines: construction management, hydrology, and transportation. The remaining subdisciplines will be covered in a follow-on Springer 2 pilot.. The project for Springer 1 involved designing a small parking lot for a church located adjacent to campus. Following initial instruction in civil engineering topics related to the project, students worked in teams to develop conceptual project designs. A design charrette allowed students to interact with different stakeholders to assess their conceptual designs and incorporate stakeholder input into their final designs. The purpose of this paper is to describe all aspects of the Springer 1 course, including course content, teaching methods, faculty resources, and the design and results of a Student Assessment of Learning Gains (SALG) survey to assess students’ learning outcomes. An overview of the Springer 2 course is also provided. The feedback from the SALG indicated positive attitudes towards course activities and content, and that students found interaction with project stakeholders during the design charrette especially beneficial. Challenges for full scale implementation of the Springer course sequence as a requirement in the transformed curriculum are also discussed. 
    more » « less