skip to main content


Title: Fast Barycentric-Based Evaluation Over Spectral/hp Elements
Abstract

As the use of spectral/hpelement methods, and high-order finite element methods in general, continues to spread, community efforts to create efficient, optimized algorithms associated with fundamental high-order operations have grown. Core tasks such as solution expansion evaluation at quadrature points, stiffness and mass matrix generation, and matrix assembly have received tremendous attention. With the expansion of the types of problems to which high-order methods are applied, and correspondingly the growth in types of numerical tasks accomplished through high-order methods, the number and types of these core operations broaden. This work focuses on solution expansion evaluation at arbitrary points within an element. This operation is core to many postprocessing applications such as evaluation of streamlines and pathlines, as well as to field projection techniques such as mortaring. We expand barycentric interpolation techniques developed on an interval to 2D (triangles and quadrilaterals) and 3D (tetrahedra, prisms, pyramids, and hexahedra) spectral/hpelement methods. We provide efficient algorithms for their implementations, and demonstrate their effectiveness using the spectral/hpelement libraryNektar++by running a series of baseline evaluations against the ‘standard’ Lagrangian method, where an interpolation matrix is generated and matrix-multiplication applied to evaluate a point at a given location. We present results from a rigorous series of benchmarking tests for a variety of element shapes, polynomial orders and dimensions. We show that when the point of interest is to be repeatedly evaluated, the barycentric method performs at worst$$50\%$$50%slower, when compared to a cached matrix evaluation. However, when the point of interest changes repeatedly so that the interpolation matrix must be regenerated in the ‘standard’ approach, the barycentric method yields far greater performance, with a minimum speedup factor of$$7\times $$7×. Furthermore, when derivatives of the solution evaluation are also required, the barycentric method in general slightly outperforms the cached interpolation matrix method across all elements and orders, with an up to$$30\%$$30%speedup. Finally we investigate a real-world example of scalar transport using a non-conformal discontinuous Galerkin simulation, in which we observe around$$6\times $$6×speedup in computational time for the barycentric method compared to the matrix-based approach. We also explore the complexity of both interpolation methods and show that the barycentric interpolation method requires$${\mathcal {O}}(k)$$O(k)storage compared to a best case space complexity of$${\mathcal {O}}(k^2)$$O(k2)for the Lagrangian interpolation matrix method.

 
more » « less
Award ID(s):
1848508
NSF-PAR ID:
10362210
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Journal of Scientific Computing
Volume:
90
Issue:
2
ISSN:
0885-7474
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We continue the program of proving circuit lower bounds via circuit satisfiability algorithms. So far, this program has yielded several concrete results, proving that functions in$\mathsf {Quasi}\text {-}\mathsf {NP} = \mathsf {NTIME}[n^{(\log n)^{O(1)}}]$Quasi-NP=NTIME[n(logn)O(1)]and other complexity classes do not have small circuits (in the worst case and/or on average) from various circuit classes$\mathcal { C}$C, by showing that$\mathcal { C}$Cadmits non-trivial satisfiability and/or#SAT algorithms which beat exhaustive search by a minor amount. In this paper, we present a new strong lower bound consequence of having a non-trivial#SAT algorithm for a circuit class${\mathcal C}$C. Say that a symmetric Boolean functionf(x1,…,xn) issparseif it outputs 1 onO(1) values of${\sum }_{i} x_{i}$ixi. We show that for every sparsef, and for all “typical”$\mathcal { C}$C, faster#SAT algorithms for$\mathcal { C}$Ccircuits imply lower bounds against the circuit class$f \circ \mathcal { C}$fC, which may bestrongerthan$\mathcal { C}$Citself. In particular:

    #SAT algorithms fornk-size$\mathcal { C}$C-circuits running in 2n/nktime (for allk) implyNEXPdoes not have$(f \circ \mathcal { C})$(fC)-circuits of polynomial size.

    #SAT algorithms for$2^{n^{{\varepsilon }}}$2nε-size$\mathcal { C}$C-circuits running in$2^{n-n^{{\varepsilon }}}$2nnεtime (for someε> 0) implyQuasi-NPdoes not have$(f \circ \mathcal { C})$(fC)-circuits of polynomial size.

    Applying#SAT algorithms from the literature, one immediate corollary of our results is thatQuasi-NPdoes not haveEMAJACC0THRcircuits of polynomial size, whereEMAJis the “exact majority” function, improving previous lower bounds againstACC0[Williams JACM’14] andACC0THR[Williams STOC’14], [Murray-Williams STOC’18]. This is the first nontrivial lower bound against such a circuit class.

     
    more » « less
  2. Abstract

    Approximate integer programming is the following: For a given convex body$$K \subseteq {\mathbb {R}}^n$$KRn, either determine whether$$K \cap {\mathbb {Z}}^n$$KZnis empty, or find an integer point in the convex body$$2\cdot (K - c) +c$$2·(K-c)+cwhich isK, scaled by 2 from its center of gravityc. Approximate integer programming can be solved in time$$2^{O(n)}$$2O(n)while the fastest known methods for exact integer programming run in time$$2^{O(n)} \cdot n^n$$2O(n)·nn. So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point$$x^* \in (K \cap {\mathbb {Z}}^n)$$x(KZn)can be found in time$$2^{O(n)}$$2O(n), provided that theremaindersof each component$$x_i^* \mod \ell $$ximodfor some arbitrarily fixed$$\ell \ge 5(n+1)$$5(n+1)of$$x^*$$xare given. The algorithm is based on acutting-plane technique, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a$$2^{O(n)}n^n$$2O(n)nnalgorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (Integer programming, lattice algorithms, and deterministic, vol. Estimation. Georgia Institute of Technology, Atlanta, 2012) that is considerably more involved. Our algorithm also relies on a newasymmetric approximate Carathéodory theoremthat might be of interest on its own. Our second method concerns integer programming problems in equation-standard form$$Ax = b, 0 \le x \le u, \, x \in {\mathbb {Z}}^n$$Ax=b,0xu,xZn. Such a problem can be reduced to the solution of$$\prod _i O(\log u_i +1)$$iO(logui+1)approximate integer programming problems. This implies, for example thatknapsackorsubset-sumproblems withpolynomial variable range$$0 \le x_i \le p(n)$$0xip(n)can be solved in time$$(\log n)^{O(n)}$$(logn)O(n). For these problems, the best running time so far was$$n^n \cdot 2^{O(n)}$$nn·2O(n).

     
    more » « less
  3. Abstract

    In a Merlin–Arthur proof system, the proof verifier (Arthur) accepts valid proofs (from Merlin) with probability 1, and rejects invalid proofs with probability arbitrarily close to 1. The running time of such a system is defined to be the length of Merlin’s proof plus the running time of Arthur. We provide new Merlin–Arthur proof systems for some key problems in fine-grained complexity. In several cases our proof systems have optimal running time. Our main results include:

    Certifying that a list ofnintegers has no 3-SUM solution can be done in Merlin–Arthur time$$\tilde{O}(n)$$O~(n). Previously, Carmosino et al. [ITCS 2016] showed that the problem has a nondeterministic algorithm running in$$\tilde{O}(n^{1.5})$$O~(n1.5)time (that is, there is a proof system with proofs of length$$\tilde{O}(n^{1.5})$$O~(n1.5)and a deterministic verifier running in$$\tilde{O}(n^{1.5})$$O~(n1.5)time).

    Counting the number ofk-cliques with total edge weight equal to zero in ann-node graph can be done in Merlin–Arthur time$${\tilde{O}}(n^{\lceil k/2\rceil })$$O~(nk/2)(where$$k\ge 3$$k3). For oddk, this bound can be further improved for sparse graphs: for example, counting the number of zero-weight triangles in anm-edge graph can be done in Merlin–Arthur time$${\tilde{O}}(m)$$O~(m). Previous Merlin–Arthur protocols by Williams [CCC’16] and Björklund and Kaski [PODC’16] could only countk-cliques in unweighted graphs, and had worse running times for smallk.

    Computing the All-Pairs Shortest Distances matrix for ann-node graph can be done in Merlin–Arthur time$$\tilde{O}(n^2)$$O~(n2). Note this is optimal, as the matrix can have$$\Omega (n^2)$$Ω(n2)nonzero entries in general. Previously, Carmosino et al. [ITCS 2016] showed that this problem has an$$\tilde{O}(n^{2.94})$$O~(n2.94)nondeterministic time algorithm.

    Certifying that ann-variablek-CNF is unsatisfiable can be done in Merlin–Arthur time$$2^{n/2 - n/O(k)}$$2n/2-n/O(k). We also observe an algebrization barrier for the previous$$2^{n/2}\cdot \textrm{poly}(n)$$2n/2·poly(n)-time Merlin–Arthur protocol of R. Williams [CCC’16] for$$\#$$#SAT: in particular, his protocol algebrizes, and we observe there is no algebrizing protocol fork-UNSAT running in$$2^{n/2}/n^{\omega (1)}$$2n/2/nω(1)time. Therefore we have to exploit non-algebrizing properties to obtain our new protocol.

    Certifying a Quantified Boolean Formula is true can be done in Merlin–Arthur time$$2^{4n/5}\cdot \textrm{poly}(n)$$24n/5·poly(n). Previously, the only nontrivial result known along these lines was an Arthur–Merlin–Arthur protocol (where Merlin’s proof depends on some of Arthur’s coins) running in$$2^{2n/3}\cdot \textrm{poly}(n)$$22n/3·poly(n)time.

    Due to the centrality of these problems in fine-grained complexity, our results have consequences for many other problems of interest. For example, our work implies that certifying there is no Subset Sum solution tonintegers can be done in Merlin–Arthur time$$2^{n/3}\cdot \textrm{poly}(n)$$2n/3·poly(n), improving on the previous best protocol by Nederlof [IPL 2017] which took$$2^{0.49991n}\cdot \textrm{poly}(n)$$20.49991n·poly(n)time.

     
    more » « less
  4. Abstract

    Let$$X\rightarrow {{\mathbb {P}}}^1$$XP1be an elliptically fiberedK3 surface, admitting a sequence$$\omega _{i}$$ωiof Ricci-flat metrics collapsing the fibers. LetVbe a holomorphicSU(n) bundle overX, stable with respect to$$\omega _i$$ωi. Given the corresponding sequence$$\Xi _i$$Ξiof Hermitian–Yang–Mills connections onV, we prove that, ifEis a generic fiber, the restricted sequence$$\Xi _i|_{E}$$Ξi|Econverges to a flat connection$$A_0$$A0. Furthermore, if the restriction$$V|_E$$V|Eis of the form$$\oplus _{j=1}^n{\mathcal {O}}_E(q_j-0)$$j=1nOE(qj-0)forndistinct points$$q_j\in E$$qjE, then these points uniquely determine$$A_0$$A0.

     
    more » « less
  5. Abstract

    Finite volume, weighted essentially non-oscillatory (WENO) schemes require the computation of a smoothness indicator. This can be expensive, especially in multiple space dimensions. We consider the use of the simple smoothness indicator$$\sigma ^{\textrm{S}}= \frac{1}{N_{\textrm{S}}-1}\sum _{j} ({\bar{u}}_{j} - {\bar{u}}_{m})^2$$σS=1NS-1j(u¯j-u¯m)2, where$$N_{\textrm{S}}$$NSis the number of mesh elements in the stencil,$${\bar{u}}_j$$u¯jis the local function average over mesh elementj, and indexmgives the target element. Reconstructions utilizing standard WENO weighting fail with this smoothness indicator. We develop a modification of WENO-Z weighting that gives a reliable and accurate reconstruction of adaptive order, which we denote as SWENOZ-AO. We prove that it attains the order of accuracy of the large stencil polynomial approximation when the solution is smooth, and drops to the order of the small stencil polynomial approximations when there is a jump discontinuity in the solution. Numerical examples in one and two space dimensions on general meshes verify the approximation properties of the reconstruction. They also show it to be about 10 times faster in two space dimensions than reconstructions using the classic smoothness indicator. The new reconstruction is applied to define finite volume schemes to approximate the solution of hyperbolic conservation laws. Numerical tests show results of the same quality as standard WENO schemes using the classic smoothness indicator, but with an overall speedup in the computation time of about 3.5–5 times in 2D tests. Moreover, the computational efficiency (CPU time versus error) is noticeably improved.

     
    more » « less