skip to main content

This content will become publicly available on January 1, 2023

Title: Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates
Triangular flows, also known as Knöthe-Rosenblatt measure couplings, comprise an important building block of normalizing flow models for generative modeling and density estimation, including popular autoregressive flows such as real-valued non-volume preserving transformation models (Real NVP). We present statistical guarantees and sample complexity bounds for triangular flow statistical models. In particular, we establish the statistical consistency and the finite sample convergence rates of the minimum Kullback-Leibler divergence statistical estimator of the Knöthe-Rosenblatt measure coupling using tools from empirical process theory. Our results highlight the anisotropic geometry of function classes at play in triangular flows, shed light on optimal coordinate ordering, and lead to statistical guarantees for Jacobian flows. We conduct numerical experiments to illustrate the practical implications of our theoretical findings.
Authors:
; ; ;
Award ID(s):
2134012 2023166 2133244
Publication Date:
NSF-PAR ID:
10349861
Journal Name:
Proceedings of Machine Learning Research
Volume:
151
Page Range or eLocation-ID:
10161-10195
ISSN:
2640-3498
Sponsoring Org:
National Science Foundation
More Like this
  1. Many generative models have to combat missing modes. The conventional wisdom to this end is by reducing through training a statistical distance (such as f -divergence) between the generated distribution and provided data distribution. But this is more of a heuristic than a guarantee. The statistical distance measures a global, but not local, similarity between two distributions. Even if it is small, it does not imply a plausible mode coverage. Rethinking this problem from a game-theoretic perspective, we show that a complete mode coverage is firmly attainable. If a generative model can approximate a data distribution moderately well under a global statistical distance measure, then we will be able to find a mixture of generators that collectively covers every data point and thus every mode, with a lower-bounded generation probability. Constructing the generator mixture has a connection to the multiplicative weights update rule, upon which we propose our algorithm. We prove that our algorithm guarantees complete mode coverage. And our experiments on real and synthetic datasets confirm better mode coverage over recent approaches, ones that also use generator mixtures but rely on global statistical distances.
  2. Advanced measurement techniques and high-performance computing have made large data sets available for a range of turbulent flows in engineering applications. Drawing on this abundance of data, dynamical models that reproduce structural and statistical features of turbulent flows enable effective model-based flow control strategies. This review describes a framework for completing second-order statistics of turbulent flows using models based on the Navier–Stokes equations linearized around the turbulent mean velocity. Dynamical couplings between states of the linearized model dictate structural constraints on the statistics of flow fluctuations. Colored-in-time stochastic forcing that drives the linearized model is then sought to account for and reconcile dynamics with available data (that is, partially known statistics). The number of dynamical degrees of freedom that are directly affected by stochastic excitation is minimized as a measure of model parsimony. The spectral content of the resulting colored-in-time stochastic contribution can alternatively arise from a low-rank structural perturbation of the linearized dynamical generator, pointing to suitable dynamical corrections that may account for the absence of the nonlinear interactions in the linearized model.
  3. Emerging Industrial Internet-of-Things systems require wireless solutions to connect sensors, actuators, and controllers as part of high data rate feedback-control loops over real-time flows. A key challenge is to provide predictable performance and agility in response to fluctuations in link quality, variable workloads, and topology changes. We propose WARP to address this challenge. WARP uses programs to specify a network’s behavior and includes a synthesis procedure to automatically generate such programs from a high-level specification of the system’s workload and topology. WARP has three unique features: (1) WARP uses a domain-specific language to specify stateful programs that include conditional statements to control when a flow’s packets are transmitted. The execution paths of programs depend on the pattern of packet losses observed at runtime, thereby enabling WARP to readily adapt to packet losses due to short-term variations in link quality. (2) Our synthesis technique uses heuristics to improve network performance by considering multiple packet loss patterns and associated execution paths when determining the transmissions performed by nodes. Furthermore, the generated programs ensure that the likelihood of a flow delivering its packets by its deadline exceeds a user-specified threshold. (3) WARP can adapt to workload and topology changes without explicitly reconstructing amore »network’s program based on the observation that nodes can independently synthesize the same program when they share the same workload and topology information. Simulations show that WARP improves network throughput for data collection, dissemination, and mixed workloads on two realistic topologies. Testbed experiments show that WARP reduces the time to add new flows by 5 times over a state-of-the-art centralized control plane and guarantees the real-time and reliability of all flows.« less
  4. Abstract. Lava flows present a significant natural hazard to communities around volcanoes and are typically slow-moving (<1 to 5 cm s−1) and laminar. Recent lava flows during the 2018 eruption of Kīlauea volcano, Hawai'i, however, reached speeds as high as 11 m s−1 and were transitional to turbulent. The Kīlauea flows formed a complex network of braided channels departing from the classic rectangular channel geometry often employed by lava flow models. To investigate these extreme dynamics we develop a new lava flow model that incorporates nonlinear advection and a nonlinear expression for the fluid viscosity. The model makes use of novel discontinuous Galerkin (DG) finite-element methods and resolves complex channel geometry through the use of unstructured triangular meshes. We verify the model against an analytic test case and demonstrate convergence rates of P+1/2 for polynomials of degree 𝒫. Direct observations recorded by unoccupied aerial systems (UASs) during the Kīlauea eruption provide inlet conditions, constrain input parameters, and serve as a benchmark for model evaluation.
  5. This paper introduces a hierarchical traffic model for spread measurement of network traffic flows. The hierarchical model, which aggregates lower level flows into higher-level flows in a hierarchical structure, will allow us to measure network traffic at different granularities at once to support diverse traffic analysis from a grand view to fine-grained details. The spread of a flow is the number of distinct elements (under measurement) in the flow, where the flow label (that identifies packets belonging to the flow) and the elements (which are defined based on application need) can be found in packet headers or payload. Traditional flow spread estimators are designed without hierarchical traffic modeling in mind, and incur high overhead when they are applied to each level of the traffic hierarchy. In this paper, we propose a new Hierarchical Virtual bitmap Estimator (HVE) that performs simultaneous multi-level traffic measurement, at the same cost of a traditional estimator, without degrading measurement accuracy. We implement the proposed solution and perform experiments based on real traffic traces. The experimental results demonstrate that HVE improves measurement throughput by 43% to 155%, thanks to the reduction of perpacket processing overhead. For small to medium flows, its measurement accuracy is largely similarmore »to traditional estimators that work at one level at a time. For large aggregate and base flows, its accuracy is better, with up to 97% smaller error in our experiments.

    « less