skip to main content


Title: Topology Optimization With Many Right-Hand Sides Using Mirror Descent Stochastic Approximation—Reduction From Many to a Single Sample
Abstract Practical engineering designs typically involve many load cases. For topology optimization with many deterministic load cases, a large number of linear systems of equations must be solved at each optimization step, leading to an enormous computational cost. To address this challenge, we propose a mirror descent stochastic approximation (MD-SA) framework with various step size strategies to solve topology optimization problems with many load cases. We reformulate the deterministic objective function and gradient into stochastic ones through randomization, derive the MD-SA update, and develop algorithmic strategies. The proposed MD-SA algorithm requires only low accuracy in the stochastic gradient and thus uses only a single sample per optimization step (i.e., the sample size is always one). As a result, we reduce the number of linear systems to solve per step from hundreds to one, which drastically reduces the total computational cost, while maintaining a similar design quality. For example, for one of the design problems, the total number of linear systems to solve and wall clock time are reduced by factors of 223 and 22, respectively.  more » « less
Award ID(s):
1720305
NSF-PAR ID:
10188504
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Applied Mechanics
Volume:
87
Issue:
5
ISSN:
0021-8936
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Quantum computational supremacy arguments, which describe a way for a quantum computer to perform a task that cannot also be done by a classical computer, typically require some sort of computational assumption related to the limitations of classical computation. One common assumption is that the polynomial hierarchy ( P H ) does not collapse, a stronger version of the statement that P ≠ N P , which leads to the conclusion that any classical simulation of certain families of quantum circuits requires time scaling worse than any polynomial in the size of the circuits. However, the asymptotic nature of this conclusion prevents us from calculating exactly how many qubits these quantum circuits must have for their classical simulation to be intractable on modern classical supercomputers. We refine these quantum computational supremacy arguments and perform such a calculation by imposing fine-grained versions of the non-collapse conjecture. Our first two conjectures poly3-NSETH( a ) and per-int-NSETH( b ) take specific classical counting problems related to the number of zeros of a degree-3 polynomial in n variables over F 2 or the permanent of an n × n integer-valued matrix, and assert that any non-deterministic algorithm that solves them requires 2 c n time steps, where c ∈ { a , b } . A third conjecture poly3-ave-SBSETH( a ′ ) asserts a similar statement about average-case algorithms living in the exponential-time version of the complexity class S B P . We analyze evidence for these conjectures and argue that they are plausible when a = 1 / 2 , b = 0.999 and a ′ = 1 / 2 .Imposing poly3-NSETH(1/2) and per-int-NSETH(0.999), and assuming that the runtime of a hypothetical quantum circuit simulation algorithm would scale linearly with the number of gates/constraints/optical elements, we conclude that Instantaneous Quantum Polynomial-Time (IQP) circuits with 208 qubits and 500 gates, Quantum Approximate Optimization Algorithm (QAOA) circuits with 420 qubits and 500 constraints and boson sampling circuits (i.e. linear optical networks) with 98 photons and 500 optical elements are large enough for the task of producing samples from their output distributions up to constant multiplicative error to be intractable on current technology. Imposing poly3-ave-SBSETH(1/2), we additionally rule out simulations with constant additive error for IQP and QAOA circuits of the same size. Without the assumption of linearly increasing simulation time, we can make analogous statements for circuits with slightly fewer qubits but requiring 10 4 to 10 7 gates. 
    more » « less
  2. Abstract

    We propose a very fast approximate Markov chain Monte Carlo sampling framework that is applicable to a large class of sparse Bayesian inference problems. The computational cost per iteration in several regression models is of order O(n(s+J)), where n is the sample size, s is the underlying sparsity of the model, and J is the size of a randomly selected subset of regressors. This cost can be further reduced by data sub-sampling when stochastic gradient Langevin dynamics are employed. The algorithm is an extension of the asynchronous Gibbs sampler of Johnson et al. [(2013). Analyzing Hogwild parallel Gaussian Gibbs sampling. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13) (Vol. 2, pp. 2715–2723)], but can be viewed from a statistical perspective as a form of Bayesian iterated sure independent screening [Fan, J., Samworth, R., & Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. Journal of Machine Learning Research, 10, 2013–2038]. We show that in high-dimensional linear regression problems, the Markov chain generated by the proposed algorithm admits an invariant distribution that recovers correctly the main signal with high probability under some statistical assumptions. Furthermore, we show that its mixing time is at most linear in the number of regressors. We illustrate the algorithm with several models.

     
    more » « less
  3. Bach, Francis ; Blei, David ; Scholkopf, Bernhard (Ed.)
    This paper investigates the asymptotic behaviors of gradient descent algorithms (particularly accelerated gradient descent and stochastic gradient descent) in the context of stochastic optimization arising in statistics and machine learning, where objective functions are estimated from available data. We show that these algorithms can be computationally modeled by continuous-time ordinary or stochastic differential equations. We establish gradient flow central limit theorems to describe the limiting dynamic behaviors of these computational algorithms and the large-sample performances of the related statistical procedures, as the number of algorithm iterations and data size both go to infinity, where the gradient flow central limit theorems are governed by some linear ordinary or stochastic differential equations, like time-dependent Ornstein-Uhlenbeck processes. We illustrate that our study can provide a novel unified framework for a joint computational and statistical asymptotic analysis, where the computational asymptotic analysis studies the dynamic behaviors of these algorithms with time (or the number of iterations in the algorithms), the statistical asymptotic analysis investigates the large-sample behaviors of the statistical procedures (like estimators and classifiers) that are computed by applying the algorithms; in fact, the statistical procedures are equal to the limits of the random sequences generated from these iterative algorithms, as the number of iterations goes to infinity. The joint analysis results based on the obtained The joint analysis results based on the obtained gradient flow central limit theorems lead to the identification of four factors---learning rate, batch size, gradient covariance, and Hessian---to derive new theories regarding the local minima found by stochastic gradient descent for solving non-convex optimization problems. 
    more » « less
  4. Bach, Francis ; Blei, David ; Scholkopf, Bernhard (Ed.)
    This paper investigates the asymptotic behaviors of gradient descent algorithms (particularly accelerated gradient descent and stochastic gradient descent) in the context of stochastic optimization arising in statistics and machine learning, where objective functions are estimated from available data. We show that these algorithms can be computationally modeled by continuous-time ordinary or stochastic differential equations. We establish gradient flow central limit theorems to describe the limiting dynamic behaviors of these computational algorithms and the large-sample performances of the related statistical procedures, as the number of algorithm iterations and data size both go to infinity, where the gradient flow central limit theorems are governed by some linear ordinary or stochastic differential equations, like time-dependent Ornstein-Uhlenbeck processes. We illustrate that our study can provide a novel unified framework for a joint computational and statistical asymptotic analysis, where the computational asymptotic analysis studies the dynamic behaviors of these algorithms with time (or the number of iterations in the algorithms), the statistical asymptotic analysis investigates the large-sample behaviors of the statistical procedures (like estimators and classifiers) that are computed by applying the algorithms; in fact, the statistical procedures are equal to the limits of the random sequences generated from these iterative algorithms, as the number of iterations goes to infinity. The joint analysis results based on the obtained gradient flow central limit theorems lead to the identification of four factors—learning rate, batch size, gradient covariance, and Hessian—to derive new theories regarding the local minima found by stochastic gradient descent for solving non-convex optimization problems. 
    more » « less
  5. null (Ed.)
    We consider an ad network’s problem of allocating the auction for each individual impression to an optimal subset of advertisers with the goal of revenue maximization. This is a variant of bipartite matching except that advertisers may strategize by choosing their bidding profiles and their total budget. Because the ad network’s allocation rule affects the bidders’ strategies, equilibrium analysis is challenging. We show that this analysis is tractable when advertisers face a linear budget cost r_j. In particular, we show that the strategy in which advertisers bid their valuations shaded by a factor of 1 + r_j is an approximate equilibrium with the error decreasing with market size. This equilibrium can be interpreted as one in which a bidder facing an opportunity cost rj is guaranteed a return on investment of at least rj per dollar spent. Furthermore, in this equilibrium, the optimal allocation for the ad network, as determined from a linear program (LP), is greedy with high probability. This is in contrast with the exogenous budgets case, in which the LP optimization is challenging at practical scales. These results are evidence that, although in general such bipartite matching problems may be challenging to solve because of their high dimensionality, the optimal solution is remarkably simple at equilibrium. 
    more » « less