skip to main content


Title: An adaptive approach to machine learning for compact particle accelerators
Abstract

Machine learning (ML) tools are able to learn relationships between the inputs and outputs of large complex systems directly from data. However, for time-varying systems, the predictive capabilities of ML tools degrade if the systems are no longer accurately represented by the data with which the ML models were trained. For complex systems, re-training is only possible if the changes are slow relative to the rate at which large numbers of new input-output training data can be non-invasively recorded. In this work, we present an approach to deep learning for time-varying systems that does not require re-training, but uses instead an adaptive feedback in the architecture of deep convolutional neural networks (CNN). The feedback is based only on available system output measurements and is applied in the encoded low-dimensional dense layers of the encoder-decoder CNNs. First, we develop an inverse model of a complex accelerator system to map output beam measurements to input beam distributions, while both the accelerator components and the unknown input beam distribution vary rapidly with time. We then demonstrate our method on experimental measurements of the input and output beam distributions of the HiRES ultra-fast electron diffraction (UED) beam line at Lawrence Berkeley National Laboratory, and showcase its ability for automatic tracking of the time varying photocathode quantum efficiency map. Our method can be successfully used to aid both physics and ML-based surrogate online models to provide non-invasive beam diagnostics.

 
more » « less
NSF-PAR ID:
10305427
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
11
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In recent years, the pace of innovations in the fields of machine learning (ML) has accelerated, researchers in SysML have created algorithms and systems that parallelize ML training over multiple devices or computational nodes. As ML models become more structurally complex, many systems have struggled to provide all-round performance on a variety of models. Particularly, ML scale-up is usually underestimated in terms of the amount of knowledge and time required to map from an appropriate distribution strategy to the model. Applying parallel training systems to complex models adds nontrivial development overheads in addition to model prototyping, and often results in lower-than-expected performance. This tutorial identifies research and practical pain points in parallel ML training, and discusses latest development of algorithms and systems on addressing these challenges in both usability and performance. In particular, this tutorial presents a new perspective of unifying seemingly different distributed ML training strategies. Based on it, introduces new techniques and system architectures to simplify and automate ML parallelization. This tutorial is built upon the authors' years' of research and industry experience, comprehensive literature survey, and several latest tutorials and papers published by the authors and peer researchers. The tutorial consists of four parts. The first part will present a landscape of distributed ML training techniques and systems, and highlight the major difficulties faced by real users when writing distributed ML code with big model or big data. The second part dives deep to explain the mainstream training strategies, guided with real use case. By developing a new and unified formulation to represent the seemingly different data- and model- parallel strategies, we describe a set of techniques and algorithms to achieve ML auto-parallelization, and compiler system architectures for auto-generating and exercising parallelization strategies based on models and clusters. The third part of this tutorial exposes a hidden layer of practical pain points in distributed ML training: hyper-parameter tuning and resource allocation, and introduces techniques to improve these aspects. The fourth part is designed as a hands-on coding session, in which we will walk through the audiences on writing distributed training programs in Python, using the various distributed ML tools and interfaces provided by the Ray ecosystem. 
    more » « less
  2. Training deep neural networks can generate non-descriptive error messages or produce unusual output without any explicit errors at all. While experts rely on tacit knowledge to apply debugging strategies, non-experts lack the experience required to interpret model output and correct Deep Learning (DL) programs. In this work, we identify DL debugging heuristics and strategies used by experts, andIn this work, we categorize the types of errors novices run into when writing ML code, and map them onto opportunities where tools could help. We use them to guide the design of Umlaut. Umlaut checks DL program structure and model behavior against these heuristics; provides human-readable error messages to users; and annotates erroneous model output to facilitate error correction. Umlaut links code, model output, and tutorial-driven error messages in a single interface. We evaluated Umlaut in a study with 15 participants to determine its effectiveness in helping developers find and fix errors in their DL programs. Participants using Umlaut found and fixed significantly more bugs and were able to implement fixes for more bugs compared to a baseline condition. 
    more » « less
  3. Abstract Modern machine learning (ML) and deep learning (DL) techniques using high-dimensional data representations have helped accelerate the materials discovery process by efficiently detecting hidden patterns in existing datasets and linking input representations to output properties for a better understanding of the scientific phenomenon. While a deep neural network comprised of fully connected layers has been widely used for materials property prediction, simply creating a deeper model with a large number of layers often faces with vanishing gradient problem, causing a degradation in the performance, thereby limiting usage. In this paper, we study and propose architectural principles to address the question of improving the performance of model training and inference under fixed parametric constraints. Here, we present a general deep-learning framework based on branched residual learning (BRNet) with fully connected layers that can work with any numerical vector-based representation as input to build accurate models to predict materials properties. We perform model training for materials properties using numerical vectors representing different composition-based attributes of the respective materials and compare the performance of the proposed models against traditional ML and existing DL architectures. We find that the proposed models are significantly more accurate than the ML/DL models for all data sizes by using different composition-based attributes as input. Further, branched learning requires fewer parameters and results in faster model training due to better convergence during the training phase than existing neural networks, thereby efficiently building accurate models for predicting materials properties. 
    more » « less
  4. Partial differential equations (PDE) learning is an emerging field that combines physics and machine learning to recover unknown physical systems from experimental data. While deep learning models traditionally require copious amounts of training data, recent PDE learning techniques achieve spectacular results with limited data availability. Still, these results are empirical. Our work provides theoretical guarantees on the number of input–output training pairs required in PDE learning. Specifically, we exploit randomized numerical linear algebra and PDE theory to derive a provably data-efficient algorithm that recovers solution operators of three-dimensional uniformly elliptic PDEs from input–output data and achieves an exponential convergence rate of the error with respect to the size of the training dataset with an exceptionally high probability of success.

     
    more » « less
  5. The growing popularity of Machine Learning (ML) has led to its deployment in various sensitive domains, which has resulted in significant research focused on ML security and privacy. However, in some applications, such as Augmented/Virtual Reality, integrity verification of the outsourced ML tasks is more critical–a face that has not received much attention. Existing solutions, such as multi-party computation and proof-based systems, impose significant computation overhead, which makes them unfit for real-time applications. We propose Fides, a novel framework for real-time integrity validation of ML-as-a-Service (MLaaS) inference. Fides features a novel and efficient distillation technique–Greedy Distillation Transfer Learning–that dynamically distills and fine-tunes a space and compute-efficient verification model for verifying the corresponding service model while running inside a trusted execution environment. Fides features a client-side attack detection model that uses statistical analysis and divergence measurements to identify, with a high likelihood, if the service model is under attack. Fides also offers a re-classification functionality that predicts the original class whenever an attack is identified. We devised a generative adversarial network framework for training the attack detection and re-classification models. The evaluation shows that Fides achieves an accuracy of up to 98% for attack detection and 94% for re-classification. 
    more » « less