skip to main content


This content will become publicly available on October 1, 2024

Title: Dreaming of Data: Examining Data Augmentation for Machine Learning in Additive Manufacturing
The data generated during additive manufacturing (AM) practice can be used to train machine learning (ML) tools to reduce defects, optimize mechanical properties, or increase efficiency. In addition to the size of the repository, emerging research shows that other characteristics of the data also impact suitability of the data for AM-ML application. What should be done in cases for which the data in too small, too homogeneous, or otherwise insufficient? Data augmentation techniques present a solution, offering automated methods for increasing the quality of data. However, many of these techniques were developed for machine vision tasks, and hence their suitability for AM data has not been verified. In this study, several data augmentation techniques are applied to synthetic design repositories to characterize if and to what degree they enhance their performance as ML training sets. We discuss the comparative advantage of these data augmentation techniques across several canonical AM-ML tasks.  more » « less
Award ID(s):
2309250
NSF-PAR ID:
10496756
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Solid Freeform Fabrication Symposium
Date Published:
Journal Name:
2023 International Solid Freeform Fabrication Symposium
Format(s):
Medium: X
Location:
Austin, TX, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Statistical knowledge and domain expertise are key to extract actionable insights out of data, yet such skills rarely coexist together. In Machine Learning, high-quality results are only attainable via mindful data preprocessing, hyperparameter tuning and model selection. Domain experts are often overwhelmed by such complexity, de-facto inhibiting a wider adoption of ML techniques in other fields. Existing libraries that claim to solve this problem, still require well-trained practitioners. Those frameworks involve heavy data preparation steps and are often too slow for interactive feedback from the user, severely limiting the scope of such systems. In this paper we present Alpine Meadow, a first Interactive Automated Machine Learning tool. What makes our system unique is not only the focus on interactivity, but also the combined systemic and algorithmic design approach; on one hand we leverage ideas from query optimization, on the other we devise novel selection and pruning strategies combining cost-based Multi-Armed Bandits and Bayesian Optimization. We evaluate our system on over 300 datasets and compare against other AutoML tools, including the current NIPS winner, as well as expert solutions. Not only is Alpine Meadow able to significantly outperform the other AutoML systems while --- in contrast to the other systems --- providing interactive latencies, but also outperforms in 80% of the cases expert solutions over data sets we have never seen before. 
    more » « less
  2. Machine learning (ML) has shown to be an effective alternative to physical models for quality prediction and process optimization of metal additive manufacturing (AM). However, the inherent “black box” nature of ML techniques such as those represented by artificial neural networks has often presented a challenge to interpret ML outcomes in the framework of the complex thermodynamics that govern AM. While the practical benefits of ML provide an adequate justification, its utility as a reliable modeling tool is ultimately reliant on assured consistency with physical principles and model transparency. To facilitate the fundamental needs, physics-informed machine learning (PIML) has emerged as a hybrid machine learning paradigm that imbues ML models with physical domain knowledge such as thermomechanical laws and constraints. The distinguishing feature of PIML is the synergistic integration of data-driven methods that reflect system dynamics in real-time with the governing physics underlying AM. In this paper, the current state-of-the-art in metal AM is reviewed and opportunities for a paradigm shift to PIML are discussed, thereby identifying relevant future research directions. 
    more » « less
  3. null (Ed.)
    With the explosion in Big Data, it is often forgotten that much of the data nowadays is generated at the edge. Specifically, a major source of data is users' endpoint devices like phones, smart watches, etc., that are connected to the internet, also known as the Internet-of-Things (IoT). This "edge of data" faces several new challenges related to hardware-constraints, privacy-aware learning, and distributed learning (both training as well as inference). So what systems and machine learning algorithms can we use to generate or exploit data at the edge? Can network science help us solve machine learning (ML) problems? Can IoT-devices help people who live with some form of disability and many others benefit from health monitoring? In this tutorial, we introduce the network science and ML techniques relevant to edge computing, discuss systems for ML (e.g., model compression, quantization, HW/SW co-design, etc.) and ML for systems design (e.g., run-time resource optimization, power management for training and inference on edge devices), and illustrate their impact in addressing concrete IoT applications. 
    more » « less
  4. In recent years, we have seen increased interest in applying machine learning to system problems. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, and sketches, among many other data management tasks. Arguably, the ideas behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, these techniques will allow us to build "instance-optimized" systems: that is, systems that self-adjust to a given workload and data distribution to provide unprecedented performance without the need for tuning by an administrator. While many of these techniques promise orders-of-magnitude better performance in lab settings, there is still general skepticism about how practical the current techniques really are. The following is intended as a progress report on ML for Systems and its readiness for real-world deployments, with a focus on our projects done as part of the Data Systems and AI Lab (DSAIL) at MIT By no means is it a comprehensive overview of all existing work, which has been steadily growing over the past several years not only in the database community but also in the systems, networking, theory, PL, and many other adjacent communities. 
    more » « less
  5. Abstract Machine learning can be used to automate common or time-consuming engineering tasks for which sufficient data already exist. For instance, design repositories can be used to train deep learning algorithms to assess component manufacturability; however, methods to determine the suitability of a design repository for use with machine learning do not exist. We provide an initial investigation toward identifying such a method using “artificial” design repositories to experimentally test the extent to which altering properties of the dataset impacts the assessment precision and generalizability of neural networks trained on the data. For this experiment, we use a 3D convolutional neural network to estimate quantitative manufacturing metrics directly from voxel-based component geometries. Additive manufacturing (AM) is used as a case study because of the recent growth of AM-focused design repositories such as GrabCAD and Thingiverse that are readily accessible online. In this study, we focus only on material extrusion, the dominant consumer AM process, and investigate three AM build metrics: (1) part mass, (2) support material mass, and (3) build time. Additionally, we compare the convolutional neural network accuracy to that of a baseline multiple linear regression model. Our results suggest that training on design repositories with less standardized orientation and position resulted in more accurate trained neural networks and that orientation-dependent metrics were harder to estimate than orientation-independent metrics. Furthermore, the convolutional neural network was more accurate than the baseline linear regression model for all build metrics. 
    more » « less