skip to main content

Title: SimJEB: Simulated Jet Engine Bracket Dataset

This paper introduces the Simulated Jet Engine Bracket Dataset (SimJEB) [WBM21]: a new, public collection of crowdsourced mechanical brackets and accompanying structural simulations. SimJEB is applicable to a wide range of geometry processing tasks; the complexity of the shapes in SimJEB offer a challenge to automated geometry cleaning and meshing, while categorical labels and structural simulations facilitate classification and regression (i.e. engineering surrogate modeling). In contrast to existing shape collections, SimJEB's models are all designed for the same engineering function and thus have consistent structural loads and support conditions. On the other hand, SimJEB models are more complex, diverse, and realistic than the synthetically generated datasets commonly used in parametric surrogate model evaluation. The designs in SimJEB were derived from submissions to the GrabCAD Jet Engine Bracket Challenge: an open engineering design competition with over 700 hand‐designed CAD entries from 320 designers representing 56 countries. Each model has been cleaned, categorized, meshed, and simulated with finite element analysis according to the original competition specifications. The result is a collection of 381 diverse, high‐quality and application‐focused designs for advancing geometric deep learning, engineering surrogate modeling, automated cleaning and related geometry processing tasks.

more » « less
Award ID(s):
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Date Published:
Journal Name:
Computer Graphics Forum
Page Range / eLocation ID:
p. 9-17
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Modern performance earthquake engineering practices frequently require a large number of time‐consuming non‐linear time‐history simulations to appropriately address excitation and structural uncertainties when estimating engineering demand parameter (EDP) distributions. Surrogate modeling techniques have emerged as an attractive tool for alleviating such high computational burden in similar engineering problems. A key challenge for the application of surrogate models in earthquake engineering context relates to the aleatoric variability associated with the seismic hazard. This variability is typically expressed as high‐dimensional or non‐parametric uncertainty, and so cannot be easily incorporated within standard surrogate modeling frameworks. Rather, a surrogate modeling approach that can directly approximate the full distribution of the response output is warranted for this application. This approach needs to additionally address the fact that the response variability may change as input parameter changes, yielding a heteroscedastic behavior. Stochastic emulation techniques have emerged as a viable solution to accurately capture aleatoric uncertainties in similar contexts, and recent work by the second author has established a framework to accommodate this for earthquake engineering applications, using Gaussian Process (GP) regression to predict the EDP response distribution. The established formulation requires for a portion of the training samples the replication of simulations for different descriptions of the aleatoric uncertainty. In particular, the replicated samples are used to build a secondary GP model to predict the heteroscedastic characteristics, and these predictions are then used to formulate the primary GP that produces the full EDP distribution. This practice, however, has two downsides: it always requires minimum replications when training the secondary GP, and the information from the non‐replicated samples is utilized only for the primary GP. This research adopts an alternative stochastic GP formulation that can address both limitations. To this end, the secondary GP is trained by measuring the square of sample deviations from the mean instead of the crude sample variances. To establish the primitive mean estimates, another auxiliary GP is introduced. This way, information from all replicated and non‐replicated samples is fully leveraged for estimating both the EDP distribution and the underlying heteroscedastic behavior, while formulation accommodates an implementation using no replications. The case study examples using three different stochastic ground motion models demonstrate that the proposed approach can address both aforementioned challenges.

    more » « less
  2. null (Ed.)
    The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer† * These authors contributed equally to this work † Corresponding authors:,,,, ◊ Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section J. Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmet is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia. Introduction This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert data scientists, deeply knowledgeable in at least one other scientific domain, and competent software engineers with access to large compute resources. People who match this description are few and far between, unfortunately leading to a shrinking pool of possible participants and a loss of experts dedicating their time to solving important problems. Participation is even further restricted in the context of any challenge run on confidential use cases or with sensitive data. Recently, we designed and ran a deep learning challenge to crowd-source the development of an automated labelling system for brain recordings, aiming to advance epilepsy research. A focus of this challenge, run internally in IBM, was the development of a platform that lowers the barrier of entry and therefore mitigates the risk of excluding interested parties from participating. The challenge: enabling wide participation With the goal to run a challenge that mobilises the largest possible pool of participants from IBM (global), we designed a use case around previous work in epileptic seizure prediction [3]. In this “Deep Learning Epilepsy Detection Challenge”, participants were asked to develop an automatic labelling system to reduce the time a clinician would need to diagnose patients with epilepsy. Labelled training and blind validation data for the challenge were generously provided by Temple University Hospital (TUH) [4]. TUH also devised a novel scoring metric for the detection of seizures that was used as basis for algorithm evaluation [5]. In order to provide an experience with a low barrier of entry, we designed a generalisable challenge platform under the following principles: 1. No participant should need to have in-depth knowledge of the specific domain. (i.e. no participant should need to be a neuroscientist or epileptologist.) 2. No participant should need to be an expert data scientist. 3. No participant should need more than basic programming knowledge. (i.e. no participant should need to learn how to process fringe data formats and stream data efficiently.) 4. No participant should need to provide their own computing resources. In addition to the above, our platform should further • guide participants through the entire process from sign-up to model submission, • facilitate collaboration, and • provide instant feedback to the participants through data visualisation and intermediate online leaderboards. The platform The architecture of the platform that was designed and developed is shown in Figure 1. The entire system consists of a number of interacting components. (1) A web portal serves as the entry point to challenge participation, providing challenge information, such as timelines and challenge rules, and scientific background. The portal also facilitated the formation of teams and provided participants with an intermediate leaderboard of submitted results and a final leaderboard at the end of the challenge. (2) IBM Watson Studio [6] is the umbrella term for a number of services offered by IBM. Upon creation of a user account through the web portal, an IBM Watson Studio account was automatically created for each participant that allowed users access to IBM's Data Science Experience (DSX), the analytics engine Watson Machine Learning (WML), and IBM's Cloud Object Storage (COS) [7], all of which will be described in more detail in further sections. (3) The user interface and starter kit were hosted on IBM's Data Science Experience platform (DSX) and formed the main component for designing and testing models during the challenge. DSX allows for real-time collaboration on shared notebooks between team members. A starter kit in the form of a Python notebook, supporting the popular deep learning libraries TensorFLow [8] and PyTorch [9], was provided to all teams to guide them through the challenge process. Upon instantiation, the starter kit loaded necessary python libraries and custom functions for the invisible integration with COS and WML. In dedicated spots in the notebook, participants could write custom pre-processing code, machine learning models, and post-processing algorithms. The starter kit provided instant feedback about participants' custom routines through data visualisations. Using the notebook only, teams were able to run the code on WML, making use of a compute cluster of IBM's resources. The starter kit also enabled submission of the final code to a data storage to which only the challenge team had access. (4) Watson Machine Learning provided access to shared compute resources (GPUs). Code was bundled up automatically in the starter kit and deployed to and run on WML. WML in turn had access to shared storage from which it requested recorded data and to which it stored the participant's code and trained models. (5) IBM's Cloud Object Storage held the data for this challenge. Using the starter kit, participants could investigate their results as well as data samples in order to better design custom algorithms. (6) Utility Functions were loaded into the starter kit at instantiation. This set of functions included code to pre-process data into a more common format, to optimise streaming through the use of the NutsFlow and NutsML libraries [10], and to provide seamless access to the all IBM services used. Not captured in the diagram is the final code evaluation, which was conducted in an automated way as soon as code was submitted though the starter kit, minimising the burden on the challenge organising team. Figure 1: High-level architecture of the challenge platform Measuring success The competitive phase of the "Deep Learning Epilepsy Detection Challenge" ran for 6 months. Twenty-five teams, with a total number of 87 scientists and software engineers from 14 global locations participated. All participants made use of the starter kit we provided and ran algorithms on IBM's infrastructure WML. Seven teams persisted until the end of the challenge and submitted final solutions. The best performing solutions reached seizure detection performances which allow to reduce hundred-fold the time eliptologists need to annotate continuous EEG recordings. Thus, we expect the developed algorithms to aid in the diagnosis of epilepsy by significantly shortening manual labelling time. Detailed results are currently in preparation for publication. Equally important to solving the scientific challenge, however, was to understand whether we managed to encourage participation from non-expert data scientists. Figure 2: Primary occupation as reported by challenge participants Out of the 40 participants for whom we have occupational information, 23 reported Data Science or AI as their main job description, 11 reported being a Software Engineer, and 2 people had expertise in Neuroscience. Figure 2 shows that participants had a variety of specialisations, including some that are in no way related to data science, software engineering, or neuroscience. No participant had deep knowledge and experience in data science, software engineering and neuroscience. Conclusion Given the growing complexity of data science problems and increasing dataset sizes, in order to solve these problems, it is imperative to enable collaboration between people with differences in expertise with a focus on inclusiveness and having a low barrier of entry. We designed, implemented, and tested a challenge platform to address exactly this. Using our platform, we ran a deep-learning challenge for epileptic seizure detection. 87 IBM employees from several business units including but not limited to IBM Research with a variety of skills, including sales and design, participated in this highly technical challenge. 
    more » « less
  3. The report involves the application and testing of dynamics concepts through the use of 3D-printed components. It includes various challenges promoting innovation and critical thinking. Challenge 1 focused on exploring angular motion through the design of a 3D-printed wheel. In Challenge 2, a shake table was developed by creating a reciprocating mechanism that converted rotational-to-linear motion. The kinematic relations of the 3D model were derived from the geometry of the mechanism to meet a targeted acceleration. Challenge 3 applied structural dynamics concepts by designing columns of a structure to meet a natural frequency. Challenge 4 built upon previous challenges to test a structure and shake table under forced vibrations. The results from the experiment were used to analyze the dynamic response of a structural system. The challenges integrated 3D design and mathematical modeling to understand the importance of dynamic behaviors in structural engineering.The 3D-printing Dynamics Design (3D3) Competition intends to train School of Civil Engineering & Environmental Science (CEES) undergraduates at the University of Oklahoma in fundamental concepts related to vibrations, structural dynamics, and earthquake engineering through a semester-long, hands-on competition run in parallel with Introduction to Dynamics for Architectural and Civil Engineers (CEES 3263). Competition participants, or 3D3 Scholars, design, build, and test a bench-scale shake table using 3D-printed components. The designs of these shake tables are published here, along with all the STL files needed for teachers or students elsewhere to fabricate the tables. Also, the data collected during the challenges is published. 
    more » « less
  4. For energy-assisted compression ignition (EACI) engine propulsion at high-altitude operating conditions using sustainable jet fuels with varying cetane numbers, it is essential to develop an efficient engine control system for robust and optimal operation. Control systems are typically trained using experimental data, which can be costly and time consuming to generate due to setup time of experiments, unforeseen delays/issues with manufacturing, mishaps/engine failures and the consequent repairs (which can take weeks), and errors in measurements. Computational fluid dynamics (CFD) simulations can overcome such burdens by complementing experiments with simulated data for control system training. Such simulations, however, can be computationally expensive. Existing data-driven machine learning (ML) models have shown promise for emulating the expensive CFD simulator, but encounter key limitations here due to the expensive nature of the training data and the range of differing combustion behaviors (e.g. misfires and partial/delayed ignition) observed at such broad operating conditions. We thus develop a novel physics-integrated emulator, called the Misfire-Integrated GP (MInt-GP), which integrates important auxiliary information on engine misfires within a Gaussian process surrogate model. With limited CFD training data, we show the MInt-GP model can yield reliable predictions of in-cylinder pressure evolution profiles and subsequent heat release profiles and engine CA50 predictions at a broad range of input conditions. We further demonstrate much better prediction capabilities of the MInt-GP at different combustion behaviors compared to existing data-driven ML models such as kriging and neural networks, while also observing up to 80 times computational speed-up over CFD, thus establishing its effectiveness as a tool to assist CFD for fast data generation in control system training.

    more » « less
  5. Abstract

    Data-driven generative design (DDGD) methods utilize deep neural networks to create novel designs based on existing data. The structure-aware DDGD method can handle complex geometries and automate the assembly of separate components into systems, showing promise in facilitating creative designs. However, determining the appropriate vectorized design representation (VDR) to evaluate 3D shapes generated from the structure-aware DDGD model remains largely unexplored. To that end, we conducted a comparative analysis of surrogate models’ performance in predicting the engineering performance of 3D shapes using VDRs from two sources: the trained latent space of structure-aware DDGD models encoding structural and geometric information and an embedding method encoding only geometric information. We conducted two case studies: one involving 3D car models focusing on drag coefficients and the other involving 3D aircraft models considering both drag and lift coefficients. Our results demonstrate that using latent vectors as VDRs can significantly deteriorate surrogate models’ predictions. Moreover, increasing the dimensionality of the VDRs in the embedding method may not necessarily improve the prediction, especially when the VDRs contain more information irrelevant to the engineering performance. Therefore, when selecting VDRs for surrogate modeling, the latent vectors obtained from training structure-aware DDGD models must be used with caution, although they are more accessible once training is complete. The underlying physics associated with the engineering performance should be paid attention. This paper provides empirical evidence for the effectiveness of different types of VDRs of structure-aware DDGD for surrogate modeling, thus facilitating the construction of better surrogate models for AI-generated designs.

    more » « less