Based on the prior work of Chahl and Gopalakrishnan (2019) to infer particleion collision time distributions using a Langevin Dynamics (LD) approach, we develop a model for the nondimensional diffusion charging collision kernel β_i or H that is applicable for 0≤Ψ_E≤60,0≤Ψ_I/Ψ_E ≤1,Kn_D≤2000 (defined in the main text). The developed model for β_i for attractive Coulomb and image potential interactions, along with the model for β_i for repulsive Coulomb and image potential interactions from Gopalakrishnan et al. (2013b), is tested against published diffusion charging experimental data. Current state of the art charging models, Fuchs (1963) and Wiedensohler (1988) regression for bipolar charging, are also evaluated and discussed. Comparisons reveal that the LDbased model accurately describes unipolar fractions for 10 – 100 nm particles measured in air (Adachi et al., 1985), nitrogen and argon but not in helium (Adachi et al., 1987). Fuchs model and the LDbased model yield similar predictions in the experimental conditions considered, except in helium. In the case of bipolar charging, the LDbased model captures the experimental trends quantitatively (within ±20%) across the entire size range of 4 – 40 nm producing superior agreement than Wiedensohler’s regression. The latter systematically underpredicts charge fraction below ~20 nm in air (by up to 40%) for the data presented in Adachi et al. (1985). Comparison with the data of Gopalakrishnan et al. (2015), obtained in UHP air along with measurements of the entire ion massmobility distribution, shows excellent agreement with the predictions of the LDbased model. This demonstrates the capability to accommodate arbitrary ion populations in any background gas, when such data is available. Wiedensohler’s regression, derived for bipolar charging in air using average ion massmobility, also describes the data reasonably well in the conditions examined. However, both models failed to capture the fraction of singly and doubly charged particles in carbon dioxide warranting further investigation.
more »
« less
Breaking Reversibility Accelerates Langevin Dynamics for NonConvex Optimization
Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a nonconvex objective as an efficient algorithm to find local minima while eventually visiting a global minimum on longer timescales. LD is based on the firstorder Langevin diffusion which is reversible in time. We study two variants that are based on nonreversible Langevin diffusions: the underdamped Langevin dynamics (ULD) and the Langevin dynamics with a nonsymmetric drift (NLD). Adopting the techniques of Tzen et al. (2018) for LD to nonreversible diffusions, we show that for a given local minimum that is within an arbitrary distance from the initialization, with high probability, either the ULD trajectory ends up somewhere outside a small neighborhood of this local minimum within a recurrence time which depends on the smallest eigenvalue of the Hessian at the local minimum or they enter this neighborhood by the recurrence time and stay there for a potentially exponentially long escape time. The ULD algorithm improves upon the recurrence time obtained for LD in Tzen et al. (2018) with respect to the dependency on the smallest eigenvalue of the Hessian at the local minimum. Similar results and improvements are obtained for the NLD algorithm. We also show that nonreversible variants can exit the basin of attraction of a local minimum faster in discrete time when the objective has two local minima separated by a saddle point and quantify the amount of improvement. Our analysis suggests that nonreversible Langevin algorithms are more efficient to locate a local minimum as well as exploring the state space.
more »
« less
 NSFPAR ID:
 10256959
 Date Published:
 Journal Name:
 Proceedings of Machine Learning Research
 Volume:
 33
 ISSN:
 26403498
 Page Range / eLocation ID:
 7850  17862
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Stateoftheart hypergraph partitioners follow the multilevel paradigm that constructs multiple levels of progressively coarser hypergraphs that are used to drive cut refinements on each level of the hierarchy. Multilevel partitioners are subject to two limitations: (i) Hypergraph coarsening processes rely on local neighborhood structure without fully considering the global structure of the hypergraph. (ii) Refinement heuristics can stagnate on local minima. In this paper, we describe SpecPart, the first supervised spectral framework that directly tackles these two limitations. SpecPart solves a generalized eigenvalue problem that captures the balanced partitioning objective and global hypergraph structure in a lowdimensional vertex embedding while leveraging initial highquality solutions from multilevel partitioners as hints. SpecPart further constructs a family of trees from the vertex embedding and partitions them with a treesweeping algorithm. Then, a novel overlay of multiple treebased partitioning solutions, followed by lifting to a coarsened hypergraph, where an ILP partitioning instance is solved to alleviate local stagnation. We have validated SpecPart on multiple sets of benchmarks. Experimental results show that for some benchmarks, our SpecPart can substantially improve the cutsize by more than 50% with respect to the best published solutions obtained with leading partitioners hMETIS and KaHyPar.more » « less

Bach, Francis ; Blei, David ; Scholkopf, Bernhard (Ed.)This paper investigates the asymptotic behaviors of gradient descent algorithms (particularly accelerated gradient descent and stochastic gradient descent) in the context of stochastic optimization arising in statistics and machine learning, where objective functions are estimated from available data. We show that these algorithms can be computationally modeled by continuoustime ordinary or stochastic differential equations. We establish gradient flow central limit theorems to describe the limiting dynamic behaviors of these computational algorithms and the largesample performances of the related statistical procedures, as the number of algorithm iterations and data size both go to infinity, where the gradient flow central limit theorems are governed by some linear ordinary or stochastic differential equations, like timedependent OrnsteinUhlenbeck processes. We illustrate that our study can provide a novel unified framework for a joint computational and statistical asymptotic analysis, where the computational asymptotic analysis studies the dynamic behaviors of these algorithms with time (or the number of iterations in the algorithms), the statistical asymptotic analysis investigates the largesample behaviors of the statistical procedures (like estimators and classifiers) that are computed by applying the algorithms; in fact, the statistical procedures are equal to the limits of the random sequences generated from these iterative algorithms, as the number of iterations goes to infinity. The joint analysis results based on the obtained The joint analysis results based on the obtained gradient flow central limit theorems lead to the identification of four factorslearning rate, batch size, gradient covariance, and Hessianto derive new theories regarding the local minima found by stochastic gradient descent for solving nonconvex optimization problems.more » « less

Bach, Francis ; Blei, David ; Scholkopf, Bernhard (Ed.)This paper investigates the asymptotic behaviors of gradient descent algorithms (particularly accelerated gradient descent and stochastic gradient descent) in the context of stochastic optimization arising in statistics and machine learning, where objective functions are estimated from available data. We show that these algorithms can be computationally modeled by continuoustime ordinary or stochastic differential equations. We establish gradient flow central limit theorems to describe the limiting dynamic behaviors of these computational algorithms and the largesample performances of the related statistical procedures, as the number of algorithm iterations and data size both go to infinity, where the gradient flow central limit theorems are governed by some linear ordinary or stochastic differential equations, like timedependent OrnsteinUhlenbeck processes. We illustrate that our study can provide a novel unified framework for a joint computational and statistical asymptotic analysis, where the computational asymptotic analysis studies the dynamic behaviors of these algorithms with time (or the number of iterations in the algorithms), the statistical asymptotic analysis investigates the largesample behaviors of the statistical procedures (like estimators and classifiers) that are computed by applying the algorithms; in fact, the statistical procedures are equal to the limits of the random sequences generated from these iterative algorithms, as the number of iterations goes to infinity. The joint analysis results based on the obtained gradient flow central limit theorems lead to the identification of four factors—learning rate, batch size, gradient covariance, and Hessian—to derive new theories regarding the local minima found by stochastic gradient descent for solving nonconvex optimization problems.more » « less

null (Ed.)Can linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph structured linear systems, in the general setting, the bit complexity of solving an $n \times n$ linear system $Ax=b$ is $\tilde{O}(n^\omega)$, where $\omega < 2.372864$ is the matrix multiplication exponent. Improving on this has been an open problem even for sparse linear systems with poly$(n)$ condition number. In this paper, we present an algorithm that solves linear systems in sparse matrices asymptotically faster than matrix multiplication for any $\omega > 2$. This speedup holds for any input matrix $A$ with $o(n^{\omega 1}/\log(\kappa(A)))$ nonzeros, where $\kappa(A)$ is the condition number of $A$. For poly$(n)$conditioned matrices with $\tilde{O}(n)$ nonzeros, and the current value of $\omega$, the bit complexity of our algorithm to solve to within any $1/\text{poly}(n)$ error is $O(n^{2.331645})$. Our algorithm can be viewed as an efficient, randomized implementation of the block Krylov method via recursive low displacement rank factorizations. It is inspired by the algorithm of [Eberly et al. ISSAC `06 `07] for inverting matrices over finite fields. In our analysis of numerical stability, we develop matrix anticoncentration techniques to bound the smallest eigenvalue and the smallest gap in eigenvalues of semirandom matrices.more » « less