Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.
more »
« less
Adaptiveness and consistency of a class of online ensemble learning algorithms
Summary Expert based ensemble learning algorithms often serve as online learning algorithms for an unknown, possibly time‐varying, probability distribution. Their simplicity allows flexibility in design choices, leading to variations that balance adaptiveness and consistency. This article provides an analytical framework to quantify the adaptiveness and consistency of expert based ensemble learning algorithms. With properly selected states, the algorithms are modeled as a Markov chains. Then quantitative metrics of adaptiveness and consistency can be calculated through mathematical formulas, other than relying on numerical simulations. Results are derived for several popular ensemble learning algorithms. Success of the method has also been demonstrated in both simulation and experimental results.
more »
« less
- PAR ID:
- 10452959
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- International Journal of Robust and Nonlinear Control
- Volume:
- 31
- Issue:
- 6
- ISSN:
- 1049-8923
- Format(s):
- Medium: X Size: p. 2018-2043
- Size(s):
- p. 2018-2043
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.more » « less
-
Abstract This special edition is based on the revelation that “the lessons learned and unlearned during COVID-19 grant us an unparalleled opportunity to reflect.” Here, we reflect on lessons learned related to teacher adaptiveness. We examined how the COVID-19 pandemic demonstrated the adaptiveness necessary for teachers to knowledge generation approaches aligned with the Next Generation Science Standards. First, we outline a three-year professional development program focused on knowledge generation approaches. We present findings from teachers’ experiences teaching science from 2019 to 2021, collected through consecutive form explanatory mixed-methods analysis involving written responses to vignettes (n = 474) and classroom observations (n = 58). Then, using an individual teacher case study, we explore how the shift to virtual teaching was supported by adaptiveness. Results suggest a significant relationship between teacher adaptiveness and the use of knowledge generation approaches. We conclude with implications for elementary science teacher professional development and present questions for further research on adaptiveness.more » « less
-
Abstract In this paper, we aim to explore novel machine learning (ML) techniques to facilitate and accelerate the construction of universal equation-Of-State (EOS) models with a high accuracy while ensuring important thermodynamic consistency. When applying ML to fit a universal EOS model, there are two key requirements: (1) a high prediction accuracy to ensure precise estimation of relevant physics properties and (2) physical interpretability to support important physics-related downstream applications. We first identify a set of fundamental challenges from the accuracy perspective, including an extremely wide range of input/output space and highly sparse training data. We demonstrate that while a neural network (NN) model may fit the EOS data well, the black-box nature makes it difficult to provide physically interpretable results, leading to weak accountability of prediction results outside the training range and lack of guarantee to meet important thermodynamic consistency constraints. To this end, we propose a principled deep regression model that can be trained following a meta-learning style to predict the desired quantities with a high accuracy using scarce training data. We further introduce a uniquely designed kernel-based regularizer for accurate uncertainty quantification. An ensemble technique is leveraged to battle model overfitting with improved prediction stability. Auto-differentiation is conducted to verify that necessary thermodynamic consistency conditions are maintained. Our evaluation results show an excellent fit of the EOS table and the predicted values are ready to use for important physics-related tasks.more » « less
-
The explosive growth in supercomputers capacity has changed simulation paradigms. Simulations have shifted from a few lengthy ones to an ensemble of multiple simulations with varying initial conditions or input parameters. Thus, an ensemble consists of large volumes of multi-dimensional data that could go beyond the exascale boundaries. However, the disparity in growth rates between storage capabilities and computing resources results in I/O bottlenecks. This makes it impractical to utilize conventional postprocessing and visualization tools for analyzing such massive simulation ensembles. In situ visualization approaches alleviate I/O constraints by saving predetermined visualizations in image databases during simulation. Nevertheless, the unavailability of output raw data restricts the flexibility of post hoc exploration of in situ approaches. Much research has been conducted to mitigate this limitation, but it falls short when it comes to simultaneously exploring and analyzing parameter and ensemble spaces. In this paper, we propose an expert-in-the-loop visual exploration analytic approach. The proposed approach leverages: feature extraction, deep learning, and human expert–AI collaboration techniques to explore and analyze image-based ensembles. Our approach utilizes local features and deep learning techniques to learn the image features of ensemble members. The extracted features are then combined with simulation input parameters and fed to the visualization pipeline for in-depth exploration and analysis using human expert + AI interaction techniques. We show the effectiveness of our approach using several scientific simulation ensembles.more » « less
An official website of the United States government
