skip to main content


This content will become publicly available on December 12, 2024

Title: Should We Learn Most Likely Functions or Parameters?
Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting  more » « less
Award ID(s):
1951856
NSF-PAR ID:
10477546
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Advances in Neural Information Processing Systems
Date Published:
Journal Name:
Advances in Neural Information Processing Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, generalized polynomial chaos (gPC) expansion for land surface model parameter estimation is evaluated. We perform inverse modeling and compute the posterior distribution of the critical hydrological parameters that are subject to great uncertainty in the Community Land Model (CLM) for a given value of the output LH. The unknown parameters include those that have been identified as the most influential factors on the simulations of surface and subsurface runoff, latent and sensible heat fluxes, and soil moisture in CLM4.0. We set up the inversion problem in the Bayesian framework in two steps: (i) building a surrogate model expressing the input–output mapping, and (ii) performing inverse modeling and computing the posterior distributions of the input parameters using observation data for a given value of the output LH. The development of the surrogate model is carried out with a Bayesian procedure based on the variable selection methods that use gPC expansions. Our approach accounts for bases selection uncertainty and quantifies the importance of the gPC terms, and, hence, all of the input parameters, via the associated posterior probabilities. 
    more » « less
  2. Purpose: Little quantitative or mechanistic information about tear film breakup can be determined directly via current imaging techniques. In this paper, we present simplified mathematical models based on two proposed mechanisms of tear film breakup: evaporation of water from the tear film and tangential fluid flow within the tear film. We use our models to determine whether one or a combination of the two mechanisms causes tear film breakup in a variety of instances. In this study, we estimate related breakup parameters that cannot currently be measured in breakup during subject trials, such as tear film osmolarity and thinning rates. The present study validates our procedure against previous work.Methods: Five ordinary differential equation models for tear film thinning were designed that model evaporation, osmosis, and various types of tangential flow. Eight tear film breakup instances occurring within a time interval of 1–8 s postblink of five healthy subjects thatwere identified in fluorescence images in previous work were fit with these five models. The fitting procedure used a nonlinear least squares optimization that minimized the difference of the computed theoretical fluorescent intensity from the models and the experimental fluorescent intensity from the images. The optimization was conducted over the evaporation rate and up to three tangential flow rate parameters. The smallest norm of the difference was determined to correspond to the model that best explained the tear film dynamics.Results: All of the breakup instances were best fit by models with time-dependent tangential flow. Our optimal parameter values and thinning rate as well as tangential fluid flow profiles compare well with previous partial differential equation model results in most instances.Conclusion: Our fitting results suggest that a combination of tangential fluid flow and evaporation cause most of the breakup instances. Comparison with results from previous work suggests that the simplified models can capture the essential tear film dynamics in most cases, thereby validating this procedure for wider usage. 
    more » « less
  3. One of the most common types of models that helps us to understand neuron behavior is based on the Hodgkin–Huxley ion channel formulation (HH model). A major challenge with inferring parameters in HH models is non-uniqueness: many different sets of ion channel parameter values produce similar outputs for the same input stimulus. Such phenomena result in an objective function that exhibits multiple modes (i.e., multiple local minima). This non-uniqueness of local optimality poses challenges for parameter estimation with many algorithmic optimization techniques. HH models additionally have severe non-linearities resulting in further challenges for inferring parameters in an algorithmic fashion. To address these challenges with a tractable method in high-dimensional parameter spaces, we propose using a particular Markov chain Monte Carlo (MCMC) algorithm, which has the advantage of inferring parameters in a Bayesian framework. The Bayesian approach is designed to be suitable for multimodal solutions to inverse problems. We introduce and demonstrate the method using a three-channel HH model. We then focus on the inference of nine parameters in an eight-channel HH model, which we analyze in detail. We explore how the MCMC algorithm can uncover complex relationships between inferred parameters using five injected current levels. The MCMC method provides as a result a nine-dimensional posterior distribution, which we analyze visually with solution maps or landscapes of the possible parameter sets. The visualized solution maps show new complex structures of the multimodal posteriors, and they allow for selection of locally and globally optimal value sets, and they visually expose parameter sensitivities and regions of higher model robustness. We envision these solution maps as enabling experimentalists to improve the design of future experiments, increase scientific productivity and improve on model structure and ideation when the MCMC algorithm is applied to experimental data. 
    more » « less
  4. ABSTRACT

    Upcoming galaxy surveys will allow us to probe the growth of the cosmic large-scale structure with improved sensitivity compared to current missions, and will also map larger areas of the sky. This means that in addition to the increased precision in observations, future surveys will also access the ultralarge-scale regime, where commonly neglected effects such as lensing, redshift-space distortions, and relativistic corrections become important for calculating correlation functions of galaxy positions. At the same time, several approximations usually made in these calculations such as the Limber approximation break down at those scales. The need to abandon these approximations and simplifying assumptions at large scales creates severe issues for parameter estimation methods. On the one hand, exact calculations of theoretical angular power spectra become computationally expensive, and the need to perform them thousands of times to reconstruct posterior probability distributions for cosmological parameters makes the approach unfeasible. On the other hand, neglecting relativistic effects and relying on approximations may significantly bias the estimates of cosmological parameters. In this work, we quantify this bias and investigate how an incomplete modelling of various effects on ultralarge scales could lead to false detections of new physics beyond the standard ΛCDM model. Furthermore, we propose a simple debiasing method that allows us to recover true cosmologies without running the full parameter estimation pipeline with exact theoretical calculations. This method can therefore provide a fast way of obtaining accurate values of cosmological parameters and estimates of exact posterior probability distributions from ultralarge-scale observations.

     
    more » « less
  5. In the Hidden-Parameter MDP (HiP-MDP) framework, a family of reinforcement learning tasks is generated by varying hidden parameters specifying the dynamics and reward function for each individual task. The HiP-MDP is a natural model for families of tasks in which meta- and lifelong-reinforcement learning approaches can succeed. Given a learned context encoder that infers the hidden parameters from previous experience, most existing algorithms fall into two categories: model transfer and policy transfer, depending on which function the hidden parameters are used to parameterize. We characterize the robustness of model and policy transfer algorithms with respect to hidden parameter estimation error. We first show that the value function of HiP-MDPs is Lipschitz continuous under certain conditions. We then derive regret bounds for both settings through the lens of Lipschitz continuity. Finally, we empirically corroborate our theoretical analysis by varying the hyper-parameters governing the Lipschitz constants of two continuous control problems; the resulting performance is consistent with our theoretical results. 
    more » « less