Stencil kernel is an important type of kernel used extensively in many application domains. Over the years, researchers have been studying the optimizations on parallelization, communication reuse, and computation reuse for various target platforms. However, challenges still exist, especially on the computation reuse problem for accelerators, due to the lack of complete design-space exploration and effective design-space pruning. In this paper, we present solutions to the above challenges for a wide range of stencil kernels (i.e., stencil with reduction operations), where the computation reuse patterns are extremely flexible due to the commutative and associative properties. We formally define the complete design space, based on which we present a provably optimal dynamic programming algorithm and a heuristic beam search algorithm that provides near-optimal solutions under an architecture-aware model. Experimental results show that for synthesizing stencil kernels to FPGAs, compared with state-of-the-art stencil compiler without computation reuse capability, our proposed algorithm can reduce the look-up table (LUT) and digital signal processor (DSP) usage by 58.1% and 54.6% on average respectively, which leads to an average speedup of 2.3× for compute-intensive kernels, outperforming the latest CPU/GPU results.
more »
« less
On Cucker–Smale dynamical systems with degenerate communication
This paper introduces a new method for establishing alignment in systems of collective behavior with degenerate communication protocol. The communication protocol consists of a kernel defining interaction between pairs of agents. Degeneracy presumes that the kernel vanishes in a region, which creates a zone of indifference. A motivating example is the case of a local kernel. Lapses in communication create a lack of coercivity in the energy estimates. Our approach is the construction of a corrector functional that compensates for this lack of coercivity. We obtain a series of new results: unconditional alignment for systems on [Formula: see text] with degeneracy at close range and fat tail in the long range, and for systems on the circle with purely local kernels. The results are proved in the context of both the agent based model and its hydrodynamic counterpart (Euler alignment model). The method covers bounded and singular communication kernels.
more »
« less
- Award ID(s):
- 1813351
- PAR ID:
- 10358260
- Date Published:
- Journal Name:
- Analysis and Applications
- Volume:
- 19
- Issue:
- 04
- ISSN:
- 0219-5305
- Page Range / eLocation ID:
- 551 to 573
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The kernel two-sample test based on the maximum mean discrepancy is one of the most popular methods for detecting differences between two distributions over general metric spaces. In this paper we propose a method to boost the power of the kernel test by combining maximum mean discrepancy estimates over multiple kernels using their Mahalanobis distance. We derive the asymptotic null distribution of the proposed test statistic and use a multiplier bootstrap approach to efficiently compute the rejection region. The resulting test is universally consistent and, since it is obtained by aggregating over a collection of kernels/bandwidths, is more powerful in detecting a wide range of alternatives in finite samples. We also derive the distribution of the test statistic for both fixed and local contiguous alternatives. The latter, in particular, implies that the proposed test is statistically efficient, that is, it has nontrivial asymptotic (Pitman) efficiency. The consistency properties of the Mahalanobis and other natural aggregation methods are also explored when the number of kernels is allowed to grow with the sample size. Extensive numerical experiments are performed on both synthetic and real-world datasets to illustrate the efficacy of the proposed method over single-kernel tests. The computational complexity of the proposed method is also studied, both theoretically and in simulations. Our asymptotic results rely on deriving the joint distribution of the maximum mean discrepancy estimates using the framework of multiple stochastic integrals, which is more broadly useful, specifically, in understanding the efficiency properties of recently proposed adaptive maximum mean discrepancy tests based on kernel aggregation and also in developing more computationally efficient, linear-time tests that combine multiple kernels. We conclude with an application of the Mahalanobis aggregation method for kernels with diverging scaling parameters.more » « less
-
In this note we establish hypocoercivity and exponential relaxation to the Maxwellian for a class of kinetic Fokker-Planck-Alignment equations arising in the studies of collective behavior. Unlike previously known results in this direction that focus on convergence near Maxwellian, our result is global for hydrodynamically dense flocks, which has several consequences. In particular, if communication is long-range, the convergence is unconditional. If communication is local then all nearly aligned flocks quantified by smallness of the Fisher information relax to the Maxwellian. In the latter case the class of initial data is stable under the vanishing noise limit, i.e. it reduces to a non-trivial and natural class of traveling wave solutions to the noiseless Vlasov-Alignment equation.The main novelty in our approach is the adaptation of a mollified Favre filtration of the macroscopic momentum into the communication protocol. Such filtration has been used previously in large eddy simulations of compressible turbulence and its new variant appeared in the proof of the Onsager conjecture for inhomogeneous Navier-Stokes system. A rigorous treatment of well-posedness for smooth solutions is provided. Lastly, we prove that in the limit of strong noise and local alignment solutions to the Fokker-Planck-Alignment equation Maxwellialize to solutions of the macroscopic hydrodynamic system with the isothermal pressure.more » « less
-
This study addresses the problem of convolutional kernel learning in univariate, multivariate, and multidimensional time series data, which is crucial for interpreting temporal patterns in time series and supporting downstream machine learning tasks. First, we propose formulating convolutional kernel learning for univariate time series as a sparse regression problem with a non-negative constraint, leveraging the properties of circular convolution and circulant matrices. Second, to generalize this approach to multivariate and multidimensional time series data, we use tensor computations, reformulating the convolutional kernel learning problem in the form of tensors. This is further converted into a standard sparse regression problem through vectorization and tensor unfolding operations. In the proposed methodology, the optimization problem is addressed using the existing non-negative subspace pursuit method, enabling the convolutional kernel to capture temporal correlations and patterns. To evaluate the proposed model, we apply it to several real-world time series datasets. On the multidimensional ridesharing and taxi trip data from New York City and Chicago, the convolutional kernels reveal interpretable local correlations and cyclical patterns, such as weekly seasonality. For the monthly temperature time series data in North America, the proposed model can quantify the yearly seasonality and make it comparable across different decades. In the context of multidimensional fluid flow data, both local and nonlocal correlations captured by the convolutional kernels can reinforce tensor factorization, leading to performance improvements in fluid flow reconstruction tasks. Thus, this study lays an insightful foundation for automatically learning convolutional kernels from time series data, with an emphasis on interpretability through sparsity and non-negativity constraints.more » « less
-
Modern software architectures in cloud computing are highly reliant on interconnected local and remote services. Popular architectures, such as the service mesh, rely on the use of independent services or sidecars for a single application. While such modular approaches simplify application development and deployment, they also introduce significant communication overhead since now even local communication that is handled by the kernel becomes a performance bottleneck. This problem has been identified and partially solved for remote communication over fast NICs through the use of kernel-bypass data plane systems. However, existing kernel-bypass mechanisms challenge their practical deployment by either requiring code modification or supporting only a small subset of the network interface. In this paper, we propose Pegasus, a framework for transparent kernel bypass for local and remote communication. By transparently fusing multiple applications into a single process, Pegasus provides an in-process fast path to bypass the kernel for local communication. To accelerate remote communication over fast NICs, Pegasus uses DPDK to directly access the NIC. Pegasus supports transparent kernel bypass for unmodified binaries by implementing core OS services in user space, such as scheduling and memory management, thus removing the kernel from the critical path. Our experiments on a range of real-world applications show that, compared with Linux, Pegasus improves the throughput by 19% to 33% for local communication and 178% to 442% for remote communication, without application changes. Furthermore, Pegasus achieves 222% higher throughput than Linux for co-located, IO-intensive applications that require both local and remote communication, with each communication optimization contributing significantly.more » « less
An official website of the United States government

