skip to main content


Title: Declarative Specification for Unstructured Mesh Editing Algorithms
We introduce a novel approach to describe mesh generation, mesh adaptation, and geometric modeling algorithms relying on changing mesh connectivity using a high-level abstraction. The main motivation is to enable easy customization and development of these algorithms via a declarative specification consisting of a set of per-element invariants, operation scheduling, and attribute transfer for each editing operation. We demonstrate that widely used algorithms editing surfaces and volumes can be compactly expressed with our abstraction, and their implementation within our framework is simple, automatically parallelizable on shared-memory architectures, and with guaranteed satisfaction of the prescribed invariants. These algorithms are readable and easy to customize for specific use cases. We introduce a software library implementing this abstraction and providing automatic shared-memory parallelization.  more » « less
Award ID(s):
1908767 1835712 1901091
NSF-PAR ID:
10463903
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Graphics
Volume:
41
Issue:
6
ISSN:
0730-0301
Page Range / eLocation ID:
1 to 14
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Because of the increasing demand for intensive computation in deep neural networks, researchers have developed both hardware and software mechanisms to reduce the compute and memory burden. A widely adopted approach is to use mixed precision data types. However, it is hard to benefit from mixed precision without hardware specialization because of the overhead of data casting. Recently, hardware vendors offer tensorized instructions specialized for mixed-precision tensor operations, such as Intel VNNI, Nvidia Tensor Core, and ARM DOT. These instructions involve a new computing idiom, which reduces multiple low precision elements into one high precision element. The lack of compilation techniques for this emerging idiom makes it hard to utilize these instructions. In practice, one approach is to use vendor-provided libraries for computationally-intensive kernels, but this is inflexible and prevents further optimizations. Another approach is to manually write hardware intrinsics, which is error-prone and difficult for programmers. Some prior works tried to address this problem by creating compilers for each instruction. This requires excessive efforts when it comes to many tensorized instructions. In this work, we develop a compiler framework, UNIT, to unify the compilation for tensorized instructions. The key to this approach is a unified semantics abstraction which makes the integration of new instructions easy, and the reuse of the analysis and transformations possible. Tensorized instructions from different platforms can be compiled via UNIT with moderate effort for favorable performance. Given a tensorized instruction and a tensor operation, UNIT automatically detects the applicability of the instruction, transforms the loop organization of the operation, and rewrites the loop body to take advantage of the tensorized instruction. According to our evaluation, UNIT is able to target various mainstream hardware platforms. The generated end-to-end inference model achieves 1.3 x speedup over Intel oneDNN on an x86 CPU, 1.75x speedup over Nvidia cuDNN on an Nvidia GPU, and 1.13x speedup over a carefully tuned TVM solution for ARM DOT on an ARM CPU. 
    more » « less
  2. We study distributed causal shared memory (or key- value pairs) in an asynchronous network under crash failures. Causal memory, introduced by Ahamad et al. in the context of multi-processor environment in 1994, is an abstraction which ensures that nodes agree on the relative ordering of read and write operations that are causally related on key-value pairs. Inspired by the recent interests in geo-replicated causal storage systems (e.g., COPS, Eiger, Bolt-on), we systematically study the fault-tolerance property of the causal shared memory in the client-server model in this work. We identify that 2f + 1 servers is both necessary and sufficient to build a resilient causal memory in the presence of up to f crashed servers. We provide both the necessity proof and a new optimal algorithm that matches the bound. For evaluation, we implement our algorithm in Golang and compare the perfor- mance with state-of-the-art fault-tolerant algorithms that ensure atomicity in the Google Cloud Platform. 
    more » « less
  3. Memory models play an important role in verified compilation of imperative programming languages. A representative one is the block-based memory model of CompCert---the state-of-the-art verified C compiler. Despite its success, the abstraction over memory space provided by CompCert's memory model is still primitive and inflexible. In essence, it uses a fixed representation for identifying memory blocks in a global memory space and uses a globally shared state for distinguishing between used and unused blocks. Therefore, any reasoning about memory must work uniformly for the global memory; it is impossible to individually reason about different sub-regions of memory (i.e., the stack and global definitions). This not only incurs unnecessary complexity in compiler verification, but also poses significant difficulty for supporting verified compilation of open or concurrent programs which need to work with contextual memory, as manifested in many previous extensions of CompCert. To remove the above limitations, we propose an enhancement to the block-based memory model based on nominal techniques; we call it the nominal memory model. By adopting the key concepts of nominal techniques such as atomic names and supports to model the memory space, we are able to 1) generalize the representation of memory blocks to any types satisfying the properties of atomic names and 2) remove the global constraints for managing memory blocks, enabling flexible memory structures for open and concurrent programs. To demonstrate the effectiveness of the nominal memory model, we develop a series of extensions of CompCert based on it. These extensions show that the nominal memory model 1) supports a general framework for verified compilation of C programs, 2) enables intuitive reasoning of compiler transformations on partial memory; and 3) enables modular reasoning about programs working with contextual memory. We also demonstrate that these extensions require limited changes to the original CompCert, making the verification techniques based on the nominal memory model easy to adopt. 
    more » « less
  4. Analyzing large dynamic networks is an important problem with applications in a wide range of disciplines. A key operation is updating the network properties as its topology changes. In this paper we present graph sparsification as an efficient abstraction for updating the properties of dynamic networks. We demonstrate the applicability of graph sparsification in updating the connected components in random and scalefree networks on shared memory systems. Our results show that the updating is scalable (10X on 16 processors for larger networks). To the best of our knowledge this is the first parallel implementation of graph sparsification. Based on these initial results, we discuss how the current implementation can be further improved and how graph sparsification can be applied to updating other network properties. 
    more » « less
  5. null (Ed.)
    Flat surfaces captured by 3D point clouds are often used for localization, mapping, and modeling. Dense point cloud processing has high computation and memory costs making low-dimensional representations of flat surfaces such as polygons desirable. We present Polylidar3D, a non-convex polygon extraction algorithm which takes as input unorganized 3D point clouds (e.g., LiDAR data), organized point clouds (e.g., range images), or user-provided meshes. Non-convex polygons represent flat surfaces in an environment with interior cutouts representing obstacles or holes. The Polylidar3D front-end transforms input data into a half-edge triangular mesh. This representation provides a common level of abstraction for subsequent back-end processing. The Polylidar3D back-end is composed of four core algorithms: mesh smoothing, dominant plane normal estimation, planar segment extraction, and finally polygon extraction. Polylidar3D is shown to be quite fast, making use of CPU multi-threading and GPU acceleration when available. We demonstrate Polylidar3D’s versatility and speed with real-world datasets including aerial LiDAR point clouds for rooftop mapping, autonomous driving LiDAR point clouds for road surface detection, and RGBD cameras for indoor floor/wall detection. We also evaluate Polylidar3D on a challenging planar segmentation benchmark dataset. Results consistently show excellent speed and accuracy. 
    more » « less