skip to main content

Title: Nested Variational Inference
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.S.; Wortman Vaughan, J.
Date Published:
Journal Name:
Advances in neural information processing systems
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Windecker, Saras (Ed.)
    1. The ecological and environmental science communities have embraced machine learning (ML) for empirical modelling and prediction. However, going beyond prediction to draw insights into underlying functional relationships between response variables and environmental ‘drivers’ is less straightforward. Deriving ecological insights from fitted ML models requires techniques to extract the ‘learning’ hidden in the ML models. 2. We revisit the theoretical background and effectiveness of four approaches for deriving insights from ML: ranking independent variable importance (Gini importance, GI; permutation importance, PI; split importance, SI; and conditional permutation importance, CPI), and two approaches for inference of bivariate functional relationships (partial dependence plots, PDP; and accumulated local effect plots, ALE). We also explore the use of a surrogate model for visualization and interpretation of complex multi-variate relationships between response variables and environmental drivers. We examine the challenges and opportunities for extracting ecological insights with these interpretation approaches. Specifically, we aim to improve interpretation of ML models by investigating how effectiveness relates to (a) interpretation algorithm, (b) sample size and (c) the presence of spurious explanatory variables. 3. We base the analysis on simulations with known underlying functional relationships between response and predictor variables, with added white noise and the presence of correlated but non-influential variables. The results indicate that deriving ecological insight is strongly affected by interpretation algorithm and spurious variables, and moderately impacted by sample size. Removing spurious variables improves interpretation of ML models. Meanwhile, increasing sample size has limited value in the presence of spurious variables, but increasing sample size does improves performance once spurious variables are omitted. Among the four ranking methods, SI is slightly more effective than the other methods in the presence of spurious variables, while GI and SI yield higher accuracy when spurious variables are removed. PDP is more effective in retrieving underlying functional relationships than ALE, but its reliability declines sharply in the presence of spurious variables. Visualization and interpretation of the interactive effects of predictors and the response variable can be enhanced using surrogate models, including three-dimensional visualizations and use of loess planes to represent independent variable effects and interactions. 4. Machine learning analysts should be aware that including correlated independent variables in ML models with no clear causal relationship to response variables can interfere with ecological inference. When ecological inference is important, ML models should be constructed with independent variables that have clear causal effects on response variables. While interpreting ML models for ecological inference remains challenging, we show that careful choice of interpretation methods, exclusion of spurious variables and adequate sample size can provide more and better opportunities to ‘learn from machine learning’. 
    more » « less
  2. null (Ed.)
    Many researchers have explored retrofitting static type systems to dynamic languages. This raises the question of how to add type annotations to code that was previously untyped. One obvious solution is type inference. However, in complex type systems, in particular those with structural types, type inference typically produces most general types that are large, hard to understand, and unnatural for programmers. To solve this problem, we introduce InferDL, a novel Ruby type inference system that infers sound and useful type annotations by incorporating heuristics that guess types. For example, we might heuristically guess that a parameter whose name ends in "count" is an integer. InferDL works by first running standard type inference and then applying heuristics to any positions for which standard type inference produces overly-general types. Heuristic guesses are added as constraints to the type inference problem to ensure they are consistent with the rest of the program and other heuristic guesses; inconsistent guesses are discarded. We formalized InferDL in a core type and constraint language. We implemented InferDL on top of RDL, an existing Ruby type checker. To evaluate InferDL, we applied it to four Ruby on Rails apps that had been previously type checked with RDL, and hence had type annotations. We found that, when using heuristics, InferDL inferred 22% more types that were as or more precise than the previous annotations, compared to standard type inference without heuristics. We also found one new type error. We further evaluated InferDL by applying it to six additional apps, finding five additional type errors. Thus, we believe InferDL represents a promising approach for inferring type annotations in dynamic languages. 
    more » « less
  3. Summary

    We propose a multi-resolution scanning approach to identifying two-sample differences. Windows of multiple scales are constructed through nested dyadic partitioning on the sample space and a hypothesis regarding the two-sample difference is defined on each window. Instead of testing the hypotheses on different windows independently, we adopt a joint graphical model, namely a Markov tree, on the null or alternative states of these hypotheses to incorporate spatial correlation across windows. The induced dependence allows borrowing strength across nearby and nested windows, which we show is critical for detecting high resolution local differences. We evaluate the performance of the method through simulation and show that it substantially outperforms other state of the art two-sample tests when the two-sample difference is local, involving only a small subset of the data. We then apply it to a flow cytometry data set from immunology, in which it successfully identifies highly local differences. In addition, we show how to control properly for multiple testing in a decision theoretic approach as well as how to summarize and report the inferred two-sample difference. We also construct hierarchical extensions of the framework to incorporate adaptivity into the construction of the scanning windows to improve inference further.

    more » « less
  4. Abstract

    Heuristics are essential for addressing the complexities of engineering design processes. The goodness of heuristics is context-dependent. Appropriately tailored heuristics can enable designers to find good solutions efficiently, and inappropriate heuristics can result in cognitive biases and inferior design outcomes. While there have been several efforts at understanding which heuristics are used by designers, there is a lack of normative understanding about when different heuristics are suitable. Towards addressing this gap, this paper presents a reinforcement learning-based approach to evaluate the goodness of heuristics for three sub-problems commonly faced by designers: (1) learning the map between the design space and the performance space, (2) acquiring sequential information, and (3) stopping the information acquisition process. Using a multi-armed bandit formulation and simulation studies, we learn the suitable heuristics for these individual sub-problems under different resource constraints and problem complexities. Additionally, we learn the optimal heuristics for the combined problem (i.e., the one composing all three sub-problems), and we compare them to ones learned at the sub-problem level. The results of our simulation study indicate that the proposed reinforcement learning-based approach can be effective for determining the quality of heuristics for different problems, and how the effectiveness of the heuristics changes as a function of the designer’s preference (e.g., performance versus cost), the complexity of the problem, and the resources available.

    more » « less
  5. In the classical Steiner tree problem, given an undirected, connected graph G=(V,E) with non-negative edge costs and a set of terminals T⊆V, the objective is to find a minimum-cost tree E′⊆E that spans the terminals. The problem is APX-hard; the best known approximation algorithm has a ratio of ρ=ln(4)+ε<1.39. In this paper, we study a natural generalization, the multi-level Steiner tree (MLST) problem: given a nested sequence of terminals Tℓ⊂⋯⊂T1⊆V, compute nested trees Eℓ⊆⋯⊆E1⊆E that span the corresponding terminal sets with minimum total cost. The MLST problem and variants thereof have been studied under various names including Multi-level Network Design, Quality-of-Service Multicast tree, Grade-of-Service Steiner tree, and Multi-Tier tree. Several approximation results are known. We first present two simple O(ℓ)-approximation heuristics. Based on these, we introduce a rudimentary composite algorithm that generalizes the above heuristics, and determine its approximation ratio by solving a linear program. We then present a method that guarantees the same approximation ratio using at most 2ℓ Steiner tree computations. We compare these heuristics experimentally on various instances of up to 500 vertices using three different network generation models. We also present various integer linear programming (ILP) formulations for the MLST problem, and compare their running times on these instances. To our knowledge, the composite algorithm achieves the best approximation ratio for up to ℓ=100 levels, which is sufficient for most applications such as network visualization or designing multi-level infrastructure. 
    more » « less