skip to main content


Title: Propagating Geometry Information to Finite Element Computations
The traditional workflow in continuum mechanics simulations is that a geometry description —for example obtained using Constructive Solid Geometry (CSG) or Computer Aided Design (CAD) tools—forms the input for a mesh generator. The mesh is then used as the sole input for the finite element, finite volume, and finite difference solver, which at this point no longer has access to the original, “underlying” geometry. However, many modern techniques—for example, adaptive mesh refinement and the use of higher order geometry approximation methods—really do need information about the underlying geometry to realize their full potential. We have undertaken an exhaustive study of where typical finite element codes use geometry information, with the goal of determining what information geometry tools would have to provide. Our study shows that nearly all geometry-related needs inside the simulators can be satisfied by just two “primitives”: elementary queries posed by the simulation software to the geometry description. We then show that it is possible to provide these primitives in all of the frequently used ways in which geometries are described in common industrial workflows, and illustrate our solutions using a number of examples.  more » « less
Award ID(s):
1925595 1821210 1835673
NSF-PAR ID:
10349706
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Transactions on Mathematical Software
Volume:
47
Issue:
4
ISSN:
0098-3500
Page Range / eLocation ID:
1 to 30
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Finite Element Method (FEM) is widely used to solve discrete Partial Differential Equations (PDEs) in engineering and graphics applications. The popularity of FEM led to the development of a large family of variants, most of which require a tetrahedral or hexahedral mesh to construct the basis. While the theoretical properties of FEM basis (such as convergence rate, stability, etc.) are well understood under specific assumptions on the mesh quality, their practical performance, influenced both by the choice of the basis construction and quality of mesh generation, have not been systematically documented for large collections of automatically meshed 3D geometries. We introduce a set of benchmark problems involving most commonly solved elliptic PDEs, starting from simple cases with an analytical solution, moving to commonly used test problem setups, and using manufactured solutions for thousands of real-world, automatically meshed geometries. For all these cases, we use state-of-the-art meshing tools to create both tetrahedral and hexahedral meshes, and compare the performance of different element types for common elliptic PDEs. The goal of this benchmark is to enable comparison of complete FEM pipelines, from mesh generation to algebraic solver, and exploration of relative impact of different factors on the overall system performance. As a specific application of our geometry and benchmark dataset, we explore the question of relative advantages of unstructured (triangular/ tetrahedral) and structured (quadrilateral/hexahedral) discretizations. We observe that for Lagrange-type elements, while linear tetrahedral elements perform poorly, quadratic tetrahedral elements perform equally well or outperform hexahedral elements for our set of problems and currently available mesh generation algorithms. This observation suggests that for common problems in structural analysis, thermal analysis, and low Reynolds number flows, high-quality results can be obtained with unstructured tetrahedral meshes, which can be created robustly and automatically. We release the description of the benchmark problems, meshes, and reference implementation of our testing infrastructure to enable statistically significant comparisons between different FE methods, which we hope will be helpful in the development of new meshing and FEA techniques. 
    more » « less
  2. Abstract

    The goal of this work is to predict the effect of part geometry and process parameters on the instantaneous spatial distribution of heat, called the heat flux or thermal history, in metal parts as they are being built layer-by-layer using additive manufacturing (AM) processes. In pursuit of this goal, the objective of this work is to develop and verify a graph theory-based approach for predicting the heat flux in metal AM parts. This objective is consequential to overcome the current poor process consistency and part quality in AM. One of the main reasons for poor part quality in metal AM processes is ascribed to the heat flux in the part. For instance, constrained heat flux because of ill-considered part design leads to defects, such as warping and thermal stress-induced cracking. Existing non-proprietary approaches to predict the heat flux in AM at the part-level predominantly use mesh-based finite element analyses that are computationally tortuous — the simulation of a few layers typically requires several hours, if not days. Hence, to alleviate these challenges in metal AM processes, there is a need for efficient computational thermal models to predict the heat flux, and thereby guide part design and selection of process parameters instead of expensive empirical testing. Compared to finite element analysis techniques, the proposed mesh-free graph theory-based approach facilitates layer-by-layer simulation of the heat flux within a few minutes on a desktop computer. To explore these assertions we conducted the following two studies: (1) comparing the heat diffusion trends predicted using the graph theory approach, with finite element analysis and analytical heat transfer calculations based on Green’s functions for an elementary cuboid geometry which is subjected to an impulse heat input in a certain part of its volume, and (2) simulating the layer-by-layer deposition of three part geometries in a laser powder bed fusion metal AM process with: (a) Goldak’s moving heat source finite element method, (b) the proposed graph theory approach, and (c) further comparing the heat flux predictions from the last two approaches with a commercial solution. From the first study we report that the heat flux trend approximated by the graph theory approach is found to be accurate within 5% of the Green’s functions-based analytical solution (in terms of the symmetric mean absolute percentage error). Results from the second study show that the heat flux trends predicted for the AM parts using graph theory approach agrees with finite element analysis with error less than 15%. More pertinently, the computational time for predicting the heat flux was significantly reduced with graph theory, for instance, in one of the AM case studies the time taken to predict the heat flux in a part was less than 3 minutes using the graph theory approach compared to over 3 hours with finite element analysis. While this paper is restricted to theoretical development and verification of the graph theory approach for heat flux prediction, our forthcoming research will focus on experimental validation through in-process sensor-based heat flux measurements.

     
    more » « less
  3. The goal of this work is to predict the effect of part geometry and process parameters on the instantaneous spatiotemporal distribution of temperature, also called the thermal field or temperature history, in metal parts as they are being built layer-by-layer using additive manufacturing (AM) processes. In pursuit of this goal, the objective of this work is to develop and verify a graph theory-based approach for predicting the temperature distribution in metal AM parts. This objective is consequential to overcome the current poor process consistency and part quality in AM. One of the main reasons for poor part quality in metal AM processes is ascribed to the nature of temperature distribution in the part. For instance, steep thermal gradients created in the part during printing leads to defects, such as warping and thermal stress-induced cracking. Existing nonproprietary approaches to predict the temperature distribution in AM parts predominantly use mesh-based finite element analyses that are computationally tortuous—the simulation of a few layers typically requires several hours, if not days. Hence, to alleviate these challenges in metal AM processes, there is a need for efficient computational models to predict the temperature distribution, and thereby guide part design and selection of process parameters instead of expensive empirical testing. Compared with finite element analyses techniques, the proposed mesh-free graph theory-based approach facilitates prediction of the temperature distribution within a few minutes on a desktop computer. To explore these assertions, we conducted the following two studies: (1) comparing the heat diffusion trends predicted using the graph theory approach with finite element analysis, and analytical heat transfer calculations based on Green’s functions for an elementary cuboid geometry which is subjected to an impulse heat input in a certain part of its volume and (2) simulating the laser powder bed fusion metal AM of three-part geometries with (a) Goldak’s moving heat source finite element method, (b) the proposed graph theory approach, and (c) further comparing the thermal trends predicted from the last two approaches with a commercial solution. From the first study, we report that the thermal trends approximated by the graph theory approach are found to be accurate within 5% of the Green’s functions-based analytical solution (in terms of the symmetric mean absolute percentage error). Results from the second study show that the thermal trends predicted for the AM parts using graph theory approach agree with finite element analyses, and the computational time for predicting the temperature distribution was significantly reduced with graph theory. For instance, for one of the AM part geometries studied, the temperature trends were predicted in less than 18 min within 10% error using the graph theory approach compared with over 180 min with finite element analyses. Although this paper is restricted to theoretical development and verification of the graph theory approach, our forthcoming research will focus on experimental validation through in-process thermal measurements. 
    more » « less
  4. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should produce identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues. 
    more » « less
  5. null (Ed.)
    Consider an algorithm performing a computation on a huge random object (for example a random graph or a "long" random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally "on-the-fly" (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-Rényi G(n,p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree Δ. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n,p) model). The distribution over valid colorings is instead specified via a "huge" input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q ≥ 9Δ, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself. 
    more » « less