skip to main content

Title: Predicting carbon nanotube forest attributes and mechanical properties using simulated images and deep learning

Understanding and controlling the self-assembly of vertically oriented carbon nanotube (CNT) forests is essential for realizing their potential in myriad applications. The governing process–structure–property mechanisms are poorly understood, and the processing parameter space is far too vast to exhaustively explore experimentally. We overcome these limitations by using a physics-based simulation as a high-throughput virtual laboratory and image-based machine learning to relate CNT forest synthesis attributes to their mechanical performance. Using CNTNet, our image-based deep learning classifier module trained with synthetic imagery, combinations of CNT diameter, density, and population growth rate classes were labeled with an accuracy of >91%. The CNTNet regression module predicted CNT forest stiffness and buckling load properties with a lower root-mean-square error than that of a regression predictor based on CNT physical parameters. These results demonstrate that image-based machine learning trained using only simulated imagery can distinguish subtle CNT forest morphological features to predict physical material properties with high accuracy. CNTNet paves the way to incorporate scanning electron microscope imagery for high-throughput material discovery.

; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
npj Computational Materials
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. The parameter space of CNT forest synthesis is vast and multidimensional, making experimental and/or numerical exploration of the synthesis prohibitive. We propose a more practical approach to explore the synthesis-process relationships of CNT forests using machine learning (ML) algorithms to infer the underlying complex physical processes. Currently, no such ML model linking CNT forest morphology to synthesis parameters has been demonstrated. In the current work, we use a physics-based numerical model to generate CNT forest morphology images with known synthesis parameters to train such a ML algorithm. The CNT forest synthesis variables of CNT diameter and CNT number densities are varied to generate a total of 12 distinct CNT forest classes. Images of the resultant CNT forests at different time steps during the growth and self-assembly process are then used as the training dataset. Based on the CNT forest structural morphology, multiple single and combined histogram-based texture descriptors are used as features to build a random forest (RF) classifier to predict class labels based on correlation of CNT forest physical attributes with the growth parameters. The machine learning model achieved an accuracy of up to 83.5% on predicting the synthesis conditions of CNT number density and diameter. These results aremore »the first step towards rapidly characterizing CNT forest attributes using machine learning. Identifying the relevant process-structure interactions for the CNT forests using physics-based simulations and machine learning could rapidly advance the design, development, and adoption of CNT forest applications with varied morphologies and properties« less
  2. Abstract

    After graphene was first exfoliated in 2004, research worldwide has focused on discovering and exploiting its distinctive electronic, mechanical, and structural properties. Application of the efficacious methodology used to fabricate graphene, mechanical exfoliation followed by optical microscopy inspection, to other analogous bulk materials has resulted in many more two-dimensional (2D) atomic crystals. Despite their fascinating physical properties, manual identification of 2D atomic crystals has the clear drawback of low-throughput and hence is impractical for any scale-up applications of 2D samples. To combat this, recent integration of high-performance machine-learning techniques, usually deep learning algorithms because of their impressive object recognition abilities, with optical microscopy have been used to accelerate and automate this traditional flake identification process. However, deep learning methods require immense datasets and rely on uninterpretable and complicated algorithms for predictions. Conversely, tree-based machine-learning algorithms represent highly transparent and accessible models. We investigate these tree-based algorithms, with features that mimic color contrast, for automating the manual inspection process of exfoliated 2D materials (e.g., MoSe2). We examine their performance in comparison to ResNet, a famous Convolutional Neural Network (CNN), in terms of accuracy and the physical nature of their decision-making process. We find that the decision trees, gradient boosted decisionmore »trees, and random forests utilize physical aspects of the images to successfully identify 2D atomic crystals without suffering from extreme overfitting and high training dataset demands. We also employ a post-hoc study that identifies the sub-regions CNNs rely on for classification and find that they regularly utilize physically insignificant image attributes when correctly identifying thin materials.

    « less
  3. Abstract

    Herein, we implement and access machine learning architectures to ascertain models that differentiate healthy from apoptotic cells using exclusively forward (FSC) and side (SSC) scatter flow cytometry information. To generate training data, colorectal cancer HCT116 cells were subjected to miR-34a treatment and then classified using a conventional Annexin V/propidium iodide (PI)-staining assay. The apoptotic cells were defined as Annexin V-positive cells, which include early and late apoptotic cells, necrotic cells, as well as other dying or dead cells. In addition to fluorescent signal, we collected cell size and granularity information from the FSC and SSC parameters. Both parameters are subdivided into area, height, and width, thus providing a total of six numerical features that informed and trained our models. A collection of logistical regression, random forest, k-nearest neighbor, multilayer perceptron, and support vector machine was trained and tested for classification performance in predicting cell states using only the six aforementioned numerical features. Out of 1046 candidate models, a multilayer perceptron was chosen with 0.91 live precision, 0.93 live recall, 0.92 livefvalue and 0.97 live area under the ROC curve when applied on standardized data. We discuss and highlight differences in classifier performance and compare the results to the standardmore »practice of forward and side scatter gating, typically performed to select cells based on size and/or complexity. We demonstrate that our model, a ready-to-use module for any flow cytometry-based analysis, can provide automated, reliable, and stain-free classification of healthy and apoptotic cells using exclusively size and granularity information.

    « less
  4. Abstract

    In classical machine learning, regressors are trained without attempting to gain insight into the mechanism connecting inputs and outputs. Natural sciences, however, are interested in finding a robust interpretable function for the target phenomenon, that can return predictions even outside of the training domains. This paper focuses on viscosity prediction problem in steelmaking, and proposes Einstein–Roscoe regression (ERR), which learns the coefficients of the Einstein–Roscoe equation, and is able to extrapolate to unseen domains. Besides, it is often the case in the natural sciences that some measurements are unavailable or expensive than the others due to physical constraints. To this end, we employ a transfer learning framework based on Gaussian process, which allows us to estimate the regression parameters using the auxiliary measurements available in a reasonable cost. In experiments using the viscosity measurements in high temperature slag suspension system, ERR is compared favorably with various machine learning approaches in interpolation settings, while outperformed all of them in extrapolation settings. Furthermore, after estimating parameters using the auxiliary dataset obtained at room temperature, an increase in accuracy is observed in the high temperature dataset, which corroborates the effectiveness of the proposed approach.

  5. Abstract

    Due to climate change and rapid urbanization, Urban Heat Island (UHI), featuring significantly higher temperature in metropolitan areas than surrounding areas, has caused negative impacts on urban communities. Temporal granularity is often limited in UHI studies based on satellite remote sensing data that typically has multi-day frequency coverage of a particular urban area. This low temporal frequency has restricted the development of models for predicting UHI. To resolve this limitation, this study has developed a cyber-based geographic information science and systems (cyberGIS) framework encompassing multiple machine learning models for predicting UHI with high-frequency urban sensor network data combined with remote sensing data focused on Chicago, Illinois, from 2018 to 2020. Enabled by rapid advances in urban sensor network technologies and high-performance computing, this framework is designed to predict UHI in Chicago with fine spatiotemporal granularity based on environmental data collected with the Array of Things (AoT) urban sensor network and Landsat-8 remote sensing imagery. Our computational experiments revealed that a random forest regression (RFR) model outperforms other models with the prediction accuracy of 0.45 degree Celsius in 2020 and 0.8 degree Celsius in 2018 and 2019 with mean absolute error as the evaluation metric. Humidity, distance to geographic center, and PM2.5concentrationmore »are identified as important factors contributing to the model performance. Furthermore, we estimate UHI in Chicago with 10-min temporal frequency and 1-km spatial resolution on the hottest day in 2018. It is demonstrated that the RFR model can accurately predict UHI at fine spatiotemporal scales with high-frequency urban sensor network data integrated with satellite remote sensing data.

    « less