skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Parallel Path Tracking for Homotopy Continuation using GPU
The Homotopy Continuation (HC) method is known as a prevailing and robust approach for solving numerically complicated polyno- mial systems with guarantees of a global solution. In recent years we are witnessing tremendous advances in the theoretical and al- gorithmic foundations of HC. Furthermore, there are very efficient implementations of several variants of HC that solve large polyno- mial systems that we could not even imagine some years ago. The success of HC has motivated approaching even larger problems or gaining real-time performance. We propose to accelerate the HC computation significantly through a parallel implementation of path tracker in both straight line coefficient HC and parameter HC on a Graphical Processing Unit (GPU). The implementation involves computing independent tracks to convergence simulta- neously, as well as a parallel linear system solver and a parallel evaluation of Jacobian matrices and vectors. We evaluate the per- formance of our implementation using both popular benchmarking polynomial systems as well as polynomial systems of computer vi- sion applications. The experiments demonstrate that our GPU-HC provides as high as 28× and 20× faster than the multi-core Julia in polynomial benchmark problems and polynomial systems for computer vision applications, respectively. Code is made publicly available in https://rb.gy/cvcwgq.  more » « less
Award ID(s):
1910530
PAR ID:
10333125
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the International Symposium on Symbolic and Algebraic Computation
ISSN:
1532-1029
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Systems of polynomial equations arise frequently in computer vision, especially in multiview geometry problems. Traditional methods for solving these systems typically aim to eliminate variables to reach a univariate polynomial, e.g., a tenth-order polynomial for 5-point pose estimation, using clever manipulations, or more generally using Grobner basis, resultants, and elimination templates, leading to successful algorithms for multiview geometry and other problems. However, these methods do not work when the problem is complex and when they do, they face efficiency and stability issues. Homotopy Continuation (HC) can solve more complex problems without the stability issues, and with guarantees of a global solution, but they are known to be slow. In this paper we show that HC can be parallelized on a GPU, showing significant speedups up to 56 times on polynomial benchmarks. We also show that GPU-HC can be generically applied to a range of computer vision problems, including 4-view triangulation and trifocal pose estimation with unknown focal length, which cannot be solved with elimination template but they can be efficiently solved with HC. GPU-HC opens the door to easy formulation and solution of a range of computer vision problems. 
    more » « less
  2. Systems of polynomial equations arise frequently in computer vision, especially in multiview geometry problems. Traditional methods for solving these systems typically aim to eliminate variables to reach a univariate polynomial, e.g., a tenth-order polynomial for 5-point pose estimation, using clever manipulations, or more generally using Grobner basis, resultants, and elimination templates, leading to successful algorithms for multiview geometry and other problems. However, these methods do not work when the problem is complex and when they do, they face efficiency and stability issues. Homotopy Continuation (HC) can solve more complex problems without the stability issues, and with guarantees of a global solution, but they are known to be slow. In this paper we show that HC can be parallelized on a GPU, showing significant speedups up to 56 times on polynomial benchmarks. We also show that GPU-HC can be generically applied to a range of computer vision problems, including 4-view triangulation and trifocal pose estimation with unknown focal length, which cannot be solved with elimination template but they can be efficiently solved with HC. GPU-HC opens the door to easy formulation and solution of a range of computer vision problems. 
    more » « less
  3. The tree edit distance (TED) has been found in a wide spectrum of applications in artificial intelligence, bioinformatics, and other areas, which serves as a metric to quantify the dissimilarity between two trees. As applications continue to scale in data size, with a growing demand for fast response time, TED has become even more increasingly data- and computing-intensive. Over the years, researchers have made dedicated efforts to improve sequential TED algorithms by reducing their high complexity. However, achieving efficient parallel TED computation in both algorithm and implementation is challenging due to its dynamic programming nature involving non-trivial issues of data dependency, runtime execution pattern changes, and optimal utilization of limited parallel resources. Having comprehensively investigated the bottlenecks in the existing parallel TED algorithms, we develop a massive parallel computation framework for TED and its implementation on GPU, which is called X-TED. For a given TED computation, X-TED applies a fast preprocessing algorithm to identify dependency relationships among millions of dynamic programming tables. Subsequently, it adopts a dynamic parallel strategy to handle various processing stages, aiming to best utilize GPU cores and the limited device memory in an adaptive and automatic way. Our intensive experimental results demonstrate that X-TED surpasses all existing solutions, achieving up to 42x speedup over the state-of-the-art sequential AP-TED, and outperforming the existing multicore parallel MC-TED by an average speedup of 31x. 
    more » « less
  4. The k-core of a graph is the largest induced sub-graph with minimum degree k. The problem of k-core decomposition finds the k-cores of a graph for all valid values of k, and it has many applications such as network analysis, computational biology and graph visualization. Currently, there are two types of parallel algorithms for k-core decomposition: (1) degree-based vertex peeling, and (2) iterative h-index refinement. There is, however, few studies on accelerating k-core decomposition using GPU. In this paper, we propose a highly optimized peeling algorithm on a GPU, and compare it with possible implementations on top of think-like-a-vertex graph-parallel GPU systems as well as existing serial and parallel k-core decomposition algorithms on CPUs. Extensive experiments show that our GPU algorithm is the overall winner in both time and space. Our source code is released at https://github.com/akhlaqueak/KCoreGPU. 
    more » « less
  5. Summation-by-parts (SBP) finite difference methods are widely used in scientific applications alongside a special treatment of boundary conditions through the simultaneous-approximate-term (SAT) technique which enables the valuable proof of numerical stability. Our work is motivated by multi-scale earthquake cycle simulations described by partial differential equations (PDEs) whose discretizations lead to huge systems of equations and often rely on iterative schemes and parallel implementations to make the nu- merical solutions tractable. In this study, we consider 2D, variable coefficient elliptic PDEs in complex geometries discretized with the SBP-SAT method. The multigrid method is a well-known, efficient solver or preconditioner for traditional numerical discretizations, but they have not been well-developed for SBP-SAT methods on HPC platforms. We propose a custom geometric-multigrid pre- conditioned conjugate-gradient (MGCG) method that applies SBP- preserving interpolations. We then present novel, matrix-free GPU kernels designed specifically for SBP operators whose differences from traditional methods make this task nontrivial but that perform 3× faster than SpMV while requiring only a fraction of memory. The matrix-free GPU implementation of our MGCG method per- forms 5× faster than the SpMV counterpart for the largest problems considered (67 million degrees of freedom). When compared to off- the-shelf solvers in the state-of-the-art libraries PETSc and AmgX, our implementation achieves superior performance in both itera- tions and overall runtime. The method presented in this work offers an attractive solver for simulations using the SBP-SAT method. 
    more » « less