skip to main content


Search for: All records

Award ID contains: 2152908

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Melt pool dynamics in metal additive manufacturing (AM) is critical to process stability, microstructure formation, and final properties of the printed materials. Physics-based simulation, including computational fluid dynamics (CFD), is the dominant approach to predict melt pool dynamics. However, the physics-based simulation approaches suffer from the inherent issue of very high computational cost. This paper provides a physics-informed machine learning method by integrating the conventional neural networks with the governing physical laws to predict the melt pool dynamics, such as temperature, velocity, and pressure, without using any training data on velocity and pressure. This approach avoids solving the nonlinear Navier–Stokes equation numerically, which significantly reduces the computational cost (if including the cost of velocity data generation). The difficult-to-determine parameters' values of the governing equations can also be inferred through data-driven discovery. In addition, the physics-informed neural network (PINN) architecture has been optimized for efficient model training. The data-efficient PINN model is attributed to the extra penalty by incorporating governing PDEs, initial conditions, and boundary conditions in the PINN model.

     
    more » « less
    Free, publicly-accessible full text available August 1, 2025
  2. Abstract

    Powder bed fusion (PBF) is an additive manufacturing process in which laser heat liquefies blown powder particles on top of a powder bed, and cooling solidifies the melted powder particles. During this process, the laser beam heat interacts with the powder causing thermal emission and affecting the melt pool. This paper aims to predict heat emission in PBF by harnessing the strengths of recurrent neural networks. Long short-term memory (LSTM) networks are developed to learn from sequential data (emission readings), while the learning is guided by process physics including laser power, laser speed, layer number, and scanning patterns. To reduce the computational efforts on model training, the LSTM models are integrated with a new approach for down-sampling the pyrometry raw data and extracting useful statistical features from raw data. The structure and hyperparameters of the LSTM model reflect several iterations of tuning based on the training on the pyrometer readings data. Results reveal useful knowledge on how raw pyrometer data should be processed to work the best with LSTM, how physics features are informative in predicting overheating, and the effectiveness of physics-guided LSTM in emission prediction.

     
    more » « less
    Free, publicly-accessible full text available January 1, 2025
  3. Free, publicly-accessible full text available June 1, 2025
  4. Recent studies show that large language models (LLM) unintendedly memorize part of the training data, which brings serious privacy risks. For example, it has been shown that over 1% of tokens generated unprompted by an LLM are part of sequences in the training data. However, current studies mainly focus on the exact memorization behaviors. In this paper, we propose to evaluate how many generated texts have near-duplicates (e.g., only differ by a couple of tokens out of 100) in the training corpus. A major challenge of conducting this evaluation is the huge computation cost incurred by near-duplicate sequence searches. This is because modern LLMs are trained on larger and larger corpora with up to 1 trillion tokens. What's worse is that the number of sequences in a text is quadratic to the text length. To address this issue, we develop an efficient and scalable near-duplicate sequence search algorithm in this paper. It can find (almost) all the near-duplicate sequences of the query sequence in a large corpus with guarantees. Specifically, the algorithm generates and groups the min-hash values of all the sequences with at least t tokens (as very short near-duplicates are often irrelevant noise) in the corpus in linear time to the corpus size. We formally prove that only 2 n+1/t+1 -1 min-hash values are generated for a text with n tokens in expectation. Thus the index time and size are reasonable. When a query arrives, we find all the sequences sharing enough min-hash values with the query using inverted indexes and prefix filtering. Extensive experiments on a few large real-world LLM training corpora show that our near-duplicate sequence search algorithm is efficient and scalable.

     
    more » « less