skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fast Barycentric-Based Evaluation Over Spectral/hp Elements
Abstract As the use of spectral/hpelement methods, and high-order finite element methods in general, continues to spread, community efforts to create efficient, optimized algorithms associated with fundamental high-order operations have grown. Core tasks such as solution expansion evaluation at quadrature points, stiffness and mass matrix generation, and matrix assembly have received tremendous attention. With the expansion of the types of problems to which high-order methods are applied, and correspondingly the growth in types of numerical tasks accomplished through high-order methods, the number and types of these core operations broaden. This work focuses on solution expansion evaluation at arbitrary points within an element. This operation is core to many postprocessing applications such as evaluation of streamlines and pathlines, as well as to field projection techniques such as mortaring. We expand barycentric interpolation techniques developed on an interval to 2D (triangles and quadrilaterals) and 3D (tetrahedra, prisms, pyramids, and hexahedra) spectral/hpelement methods. We provide efficient algorithms for their implementations, and demonstrate their effectiveness using the spectral/hpelement libraryNektar++by running a series of baseline evaluations against the ‘standard’ Lagrangian method, where an interpolation matrix is generated and matrix-multiplication applied to evaluate a point at a given location. We present results from a rigorous series of benchmarking tests for a variety of element shapes, polynomial orders and dimensions. We show that when the point of interest is to be repeatedly evaluated, the barycentric method performs at worst$$50\%$$ 50 % slower, when compared to a cached matrix evaluation. However, when the point of interest changes repeatedly so that the interpolation matrix must be regenerated in the ‘standard’ approach, the barycentric method yields far greater performance, with a minimum speedup factor of$$7\times $$ 7 × . Furthermore, when derivatives of the solution evaluation are also required, the barycentric method in general slightly outperforms the cached interpolation matrix method across all elements and orders, with an up to$$30\%$$ 30 % speedup. Finally we investigate a real-world example of scalar transport using a non-conformal discontinuous Galerkin simulation, in which we observe around$$6\times $$ 6 × speedup in computational time for the barycentric method compared to the matrix-based approach. We also explore the complexity of both interpolation methods and show that the barycentric interpolation method requires$${\mathcal {O}}(k)$$ O ( k ) storage compared to a best case space complexity of$${\mathcal {O}}(k^2)$$ O ( k 2 ) for the Lagrangian interpolation matrix method.  more » « less
Award ID(s):
1848508
PAR ID:
10362210
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Journal of Scientific Computing
Volume:
90
Issue:
2
ISSN:
0885-7474
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In a Merlin–Arthur proof system, the proof verifier (Arthur) accepts valid proofs (from Merlin) with probability 1, and rejects invalid proofs with probability arbitrarily close to 1. The running time of such a system is defined to be the length of Merlin’s proof plus the running time of Arthur. We provide new Merlin–Arthur proof systems for some key problems in fine-grained complexity. In several cases our proof systems have optimal running time. Our main results include:Certifying that a list ofnintegers has no 3-SUM solution can be done in Merlin–Arthur time$$\tilde{O}(n)$$ O ~ ( n ) . Previously, Carmosino et al. [ITCS 2016] showed that the problem has a nondeterministic algorithm running in$$\tilde{O}(n^{1.5})$$ O ~ ( n 1.5 ) time (that is, there is a proof system with proofs of length$$\tilde{O}(n^{1.5})$$ O ~ ( n 1.5 ) and a deterministic verifier running in$$\tilde{O}(n^{1.5})$$ O ~ ( n 1.5 ) time).Counting the number ofk-cliques with total edge weight equal to zero in ann-node graph can be done in Merlin–Arthur time$${\tilde{O}}(n^{\lceil k/2\rceil })$$ O ~ ( n k / 2 ) (where$$k\ge 3$$ k 3 ). For oddk, this bound can be further improved for sparse graphs: for example, counting the number of zero-weight triangles in anm-edge graph can be done in Merlin–Arthur time$${\tilde{O}}(m)$$ O ~ ( m ) . Previous Merlin–Arthur protocols by Williams [CCC’16] and Björklund and Kaski [PODC’16] could only countk-cliques in unweighted graphs, and had worse running times for smallk.Computing the All-Pairs Shortest Distances matrix for ann-node graph can be done in Merlin–Arthur time$$\tilde{O}(n^2)$$ O ~ ( n 2 ) . Note this is optimal, as the matrix can have$$\Omega (n^2)$$ Ω ( n 2 ) nonzero entries in general. Previously, Carmosino et al. [ITCS 2016] showed that this problem has an$$\tilde{O}(n^{2.94})$$ O ~ ( n 2.94 ) nondeterministic time algorithm.Certifying that ann-variablek-CNF is unsatisfiable can be done in Merlin–Arthur time$$2^{n/2 - n/O(k)}$$ 2 n / 2 - n / O ( k ) . We also observe an algebrization barrier for the previous$$2^{n/2}\cdot \textrm{poly}(n)$$ 2 n / 2 · poly ( n ) -time Merlin–Arthur protocol of R. Williams [CCC’16] for$$\#$$ # SAT: in particular, his protocol algebrizes, and we observe there is no algebrizing protocol fork-UNSAT running in$$2^{n/2}/n^{\omega (1)}$$ 2 n / 2 / n ω ( 1 ) time. Therefore we have to exploit non-algebrizing properties to obtain our new protocol.Certifying a Quantified Boolean Formula is true can be done in Merlin–Arthur time$$2^{4n/5}\cdot \textrm{poly}(n)$$ 2 4 n / 5 · poly ( n ) . Previously, the only nontrivial result known along these lines was an Arthur–Merlin–Arthur protocol (where Merlin’s proof depends on some of Arthur’s coins) running in$$2^{2n/3}\cdot \textrm{poly}(n)$$ 2 2 n / 3 · poly ( n ) time.Due to the centrality of these problems in fine-grained complexity, our results have consequences for many other problems of interest. For example, our work implies that certifying there is no Subset Sum solution tonintegers can be done in Merlin–Arthur time$$2^{n/3}\cdot \textrm{poly}(n)$$ 2 n / 3 · poly ( n ) , improving on the previous best protocol by Nederlof [IPL 2017] which took$$2^{0.49991n}\cdot \textrm{poly}(n)$$ 2 0.49991 n · poly ( n ) time. 
    more » « less
  2. Abstract LetXbe ann-element point set in thek-dimensional unit cube$$[0,1]^k$$ [ 0 , 1 ] k where$$k \ge 2$$ k 2 . According to an old result of Bollobás and Meir (Oper Res Lett 11:19–21, 1992) , there exists a cycle (tour)$$x_1, x_2, \ldots , x_n$$ x 1 , x 2 , , x n through thenpoints, such that$$\left( \sum _{i=1}^n |x_i - x_{i+1}|^k \right) ^{1/k} \le c_k$$ i = 1 n | x i - x i + 1 | k 1 / k c k , where$$|x-y|$$ | x - y | is the Euclidean distance betweenxandy, and$$c_k$$ c k is an absolute constant that depends only onk, where$$x_{n+1} \equiv x_1$$ x n + 1 x 1 . From the other direction, for every$$k \ge 2$$ k 2 and$$n \ge 2$$ n 2 , there existnpoints in$$[0,1]^k$$ [ 0 , 1 ] k , such that their shortest tour satisfies$$\left( \sum _{i=1}^n |x_i - x_{i+1}|^k \right) ^{1/k} = 2^{1/k} \cdot \sqrt{k}$$ i = 1 n | x i - x i + 1 | k 1 / k = 2 1 / k · k . For the plane, the best constant is$$c_2=2$$ c 2 = 2 and this is the only exact value known. Bollobás and Meir showed that one can take$$c_k = 9 \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}$$ c k = 9 2 3 1 / k · k for every$$k \ge 3$$ k 3 and conjectured that the best constant is$$c_k = 2^{1/k} \cdot \sqrt{k}$$ c k = 2 1 / k · k , for every$$k \ge 2$$ k 2 . Here we significantly improve the upper bound and show that one can take$$c_k = 3 \sqrt{5} \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}$$ c k = 3 5 2 3 1 / k · k or$$c_k = 2.91 \sqrt{k} \ (1+o_k(1))$$ c k = 2.91 k ( 1 + o k ( 1 ) ) . Our bounds are constructive. We also show that$$c_3 \ge 2^{7/6}$$ c 3 2 7 / 6 , which disproves the conjecture for$$k=3$$ k = 3 . Connections to matching problems, power assignment problems, related problems, including algorithms, are discussed in this context. A slightly revised version of the Bollobás–Meir conjecture is proposed. 
    more » « less
  3. Abstract Let us fix a primepand a homogeneous system ofmlinear equations$$a_{j,1}x_1+\dots +a_{j,k}x_k=0$$ a j , 1 x 1 + + a j , k x k = 0 for$$j=1,\dots ,m$$ j = 1 , , m with coefficients$$a_{j,i}\in \mathbb {F}_p$$ a j , i F p . Suppose that$$k\ge 3m$$ k 3 m , that$$a_{j,1}+\dots +a_{j,k}=0$$ a j , 1 + + a j , k = 0 for$$j=1,\dots ,m$$ j = 1 , , m and that every$$m\times m$$ m × m minor of the$$m\times k$$ m × k matrix$$(a_{j,i})_{j,i}$$ ( a j , i ) j , i is non-singular. Then we prove that for any (large)n, any subset$$A\subseteq \mathbb {F}_p^n$$ A F p n of size$$|A|> C\cdot \Gamma ^n$$ | A | > C · Γ n contains a solution$$(x_1,\dots ,x_k)\in A^k$$ ( x 1 , , x k ) A k to the given system of equations such that the vectors$$x_1,\dots ,x_k\in A$$ x 1 , , x k A are all distinct. Here,Cand$$\Gamma $$ Γ are constants only depending onp,mandksuch that$$\Gamma Γ < p . The crucial point here is the condition for the vectors$$x_1,\dots ,x_k$$ x 1 , , x k in the solution$$(x_1,\dots ,x_k)\in A^k$$ ( x 1 , , x k ) A k to be distinct. If we relax this condition and only demand that$$x_1,\dots ,x_k$$ x 1 , , x k are not all equal, then the statement would follow easily from Tao’s slice rank polynomial method. However, handling the distinctness condition is much harder, and requires a new approach. While all previous combinatorial applications of the slice rank polynomial method have relied on the slice rank of diagonal tensors, we use a slice rank argument for a non-diagonal tensor in combination with combinatorial and probabilistic arguments. 
    more » « less
  4. Abstract In the spanning tree congestion problem, given a connected graphG, the objective is to compute a spanning treeTinGthat minimizes its maximum edge congestion, where the congestion of an edgeeofTis the number of edges inGfor which the unique path inTbetween their endpoints traversese. The problem is known to be$$\mathbb{N}\mathbb{P}$$ N P -hard, but its approximability is still poorly understood, and it is not even known whether the optimum solution can be efficiently approximated with ratioo(n). In the decision version of this problem, denoted$${\varvec{K}-\textsf {STC}}$$ K - STC , we need to determine ifGhas a spanning tree with congestion at mostK. It is known that$${\varvec{K}-\textsf {STC}}$$ K - STC is$$\mathbb{N}\mathbb{P}$$ N P -complete for$$K\ge 8$$ K 8 , and this implies a lower bound of 1.125 on the approximation ratio of minimizing congestion. On the other hand,$${\varvec{3}-\textsf {STC}}$$ 3 - STC can be solved in polynomial time, with the complexity status of this problem for$$K\in { \left\{ 4,5,6,7 \right\} }$$ K 4 , 5 , 6 , 7 remaining an open problem. We substantially improve the earlier hardness results by proving that$${\varvec{K}-\textsf {STC}}$$ K - STC is$$\mathbb{N}\mathbb{P}$$ N P -complete for$$K\ge 5$$ K 5 . This leaves only the case$$K=4$$ K = 4 open, and improves the lower bound on the approximation ratio to 1.2. Motivated by evidence that minimizing congestion is hard even for graphs of small constant radius, we also consider$${\varvec{K}-\textsf {STC}}$$ K - STC restricted to graphs of radius 2, and we prove that this variant is$$\mathbb{N}\mathbb{P}$$ N P -complete for all$$K\ge 6$$ K 6
    more » « less
  5. Abstract It is natural to generalize the online$$k$$ k -Server problem by allowing each request to specify not only a pointp, but also a subsetSof servers that may serve it. To date, only a few special cases of this problem have been studied. The objective of the work presented in this paper has been to more systematically explore this generalization in the case of uniform and star metrics. For uniform metrics, the problem is equivalent to a generalization of Paging in which each request specifies not only a pagep, but also a subsetSof cache slots, and is satisfied by having a copy ofpin some slot inS. We call this problemSlot-Heterogenous Paging. In realistic settings only certain subsets of cache slots or servers would appear in requests. Therefore we parameterize the problem by specifying a family$${\mathcal {S}}\subseteq 2^{[k]}$$ S 2 [ k ] of requestable slot sets, and we establish bounds on the competitive ratio as a function of the cache sizekand family$${\mathcal {S}}$$ S :If all request sets are allowed ($${\mathcal {S}}=2^{[k]}\setminus \{\emptyset \}$$ S = 2 [ k ] \ { } ), the optimal deterministic and randomized competitive ratios are exponentially worse than for standard Paging ($${\mathcal {S}}=\{[k]\}$$ S = { [ k ] } ).As a function of$$|{\mathcal {S}}|$$ | S | andk, the optimal deterministic ratio is polynomial: at most$$O(k^2|{\mathcal {S}}|)$$ O ( k 2 | S | ) and at least$$\Omega (\sqrt{|{\mathcal {S}}|})$$ Ω ( | S | ) .For any laminar family$${\mathcal {S}}$$ S of heighth, the optimal ratios areO(hk) (deterministic) and$$O(h^2\log k)$$ O ( h 2 log k ) (randomized).The special case of laminar$${\mathcal {S}}$$ S that we callAll-or-One Pagingextends standard Paging by allowing each request to specify a specific slot to put the requested page in. The optimal deterministic ratio forweightedAll-or-One Paging is$$\Theta (k)$$ Θ ( k ) . Offline All-or-One Paging is$$\mathbb{N}\mathbb{P}$$ N P -hard.Some results for the laminar case are shown via a reduction to the generalization of Paging in which each request specifies a set$$P$$ P ofpages, and is satisfied by fetching any page from$$P$$ P into the cache. The optimal ratios for the latter problem (with laminar family of heighth) are at mosthk(deterministic) and$$hH_k$$ h H k (randomized). 
    more » « less