skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Machine Learning-Assisted Carbon Dot Synthesis: Prediction of Emission Color and Wavelength
Carbon dots (CDs) have attracted great attention in a range of applications due to their bright photoluminescence, high photostability, and good biocompatibility. However, it is challenging to design CDs with specific emission properties because the syntheses involve many parameters, and it is not clear how each parameter influences the CD properties. To help bridge this gap, machine learning, specifically an artificial neural network, is employed in this work to characterize the impact of synthesis parameters on and make predictions for the emission color and wavelength for CDs. The machine reveals that the choice of reaction method, purification method, and solvent relate more closely to CD emission characteristics than the reaction temperature or time, which are frequently tuned in experiments. After considering multiple models, the best performing machine learning classification model achieved an accuracy of 94% in predicting relative to actual color. In addition, hybrid (two-stage) models incorporating both color classification and an artificial neural network k-ensemble model for wavelength prediction through regression performed significantly better than either a standard artificial neural network or a single-stage artificial neural network k-ensemble regression model. The accuracy of the model predictions was evaluated against CD emission wavelengths measured from experiments, and the minimum mean average error is 25.8 nm. Overall, the models developed in this work can effectively predict the photoluminescence emission of CDs and help design CDs with targeted optical properties.  more » « less
Award ID(s):
2001611
PAR ID:
10380787
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Journal of Chemical Information and Modeling
ISSN:
1549-9596
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Multicolor carbon dots (CDs) have been developed recently and demonstrate great potential in bio-imaging, sensing, and LEDs. However, the fluorescence mechanism of their tunable colors is still under debate, and efficient separation methods are still challenging. Herein, we synthesized multicolor polymeric CDs through solvothermal treatment of citric acid and urea in formamide. Automated reversed-phase column separation was used to achieve fractions with distinct colors, including blue, cyan, green, yellow, orange and red. This work explores the physicochemical properties and fluorescence origins of the red, green, and blue fractions in depth with combined experimental and computational methods. Three dominant fluorescence mechanism hypotheses were evaluated by comparing time-dependent density functional theory and molecular dynamics calculation results to measured characteristics. We find that blue fluorescence likely comes from embedded small molecules trapped in carbonaceous cages, while pyrene analogs are the most likely origin for emission at other wavelengths, especially in the red. Also important, upon interaction with live cells, different CD color fractions are trafficked to different sub-cellular locations. Super-resolution imaging shows that the blue CDs were found in a variety of organelles, such as mitochondria and lysosomes, while the red CDs were primarily localized in lysosomes. These findings significantly advance our understanding of the photoluminescence mechanism of multicolor CDs and help to guide future design and applications of these promising nanomaterials. 
    more » « less
  2. null (Ed.)
    Current development of high-performance fiber-reinforced cementitious composites (HPFRCC) mainly relies on intensive experiments. The main purpose of this study is to develop a machine learning method for effective and efficient discovery and development of HPFRCC. Specifically, this research develops machine learning models to predict the mechanical properties of HPFRCC through innovative incorporation of micromechanics, aiming to increase the prediction accuracy and generalization performance by enriching and improving the datasets through data cleaning, principal component analysis (PCA), and K-fold cross-validation. This study considers a total of 14 different mix design variables and predicts the ductility of HPFRCC for the first time, in addition to the compressive and tensile strengths. Different types of machine learning methods are investigated and compared, including artificial neural network (ANN), support vector regression (SVR), classification and regression tree (CART), and extreme gradient boosting tree (XGBoost). The results show that the developed machine learning models can reasonably predict the concerned mechanical properties and can be applied to perform parametric studies for the effects of different mix design variables on the mechanical properties. This study is expected to greatly promote efficient discovery and development of HPFRCC. 
    more » « less
  3. Abstract:The newer technologies such as data mining, machine learning, artificial intelligence and data analytics have revolutionized medical sector in terms of using the existing big data to predict the various patterns emerging from the datasets available inthe healthcare repositories. The predictions based on the existing datasets in the healthcare sector have rendered several benefits such as helping clinicians to make accurate and informed decisions while managing the patients’ health leading to better management of patients’ wellbeing and health-care coordination. The millions of people have been affected by the coronary artery disease (CAD). There are several machine learning including ensemble learning approach and deep neural networks-based algorithms have shown promising outcomes in improving prediction accuracy for early diagnosis of CAD. This paper analyses the deep neural network variant DRN, Rider Optimization Algorithm-Neural network (RideNN) and Deep Neural Network-Fuzzy Neural Network (DNFN) with application of ensemble learning method for improvement in the prediction accuracy of CAD. The experimental outcomes showed the proposed ensemble classifier achieved the highest accuracy compared to the other machine learning models. Keywords:Heart disease prediction, Deep Residual Network (DRN), Ensemble classifiers, coronary artery disease. 
    more » « less
  4. GPS spoofing attacks are a severe threat to unmanned aerial vehicles. These attacks manipulate the true state of the unmanned aerial vehicles, potentially misleading the system without raising alarms. Several techniques, including machine learning, have been proposed to detect these attacks. Most of the studies applied machine learning models without identifying the best hyperparameters, using feature selection and importance techniques, and ensuring that the used dataset is unbiased and balanced. However, no current studies have discussed the impact of model parameters and dataset characteristics on the performance of machine learning models; therefore, this paper fills this gap by evaluating the impact of hyperparameters, regularization parameters, dataset size, correlated features, and imbalanced datasets on the performance of six most commonly known machine learning techniques. These models are Classification and Regression Decision Tree, Artificial Neural Network, Random Forest, Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine. Thirteen features extracted from legitimate and simulated GPS attack signals are used to perform this investigation. The evaluation was performed in terms of four metrics: accuracy, probability of misdetection, probability of false alarm, and probability of detection. The results indicate that hyperparameters, regularization parameters, correlated features, dataset size, and imbalanced datasets adversely affect a machine learning model’s performance. The results also show that the Classification and Regression Decision Tree classifier has an accuracy of 99.99%, a probability of detection of 99.98%, a probability of misdetection of 0.2%, and a probability of false alarm of 1.005%, after removing correlated features and using tuned parameters in a balanced dataset. Random Forest can achieve an accuracy of 99.94%, a probability of detection of 99.6%, a probability of misdetection of 0.4%, and a probability of false alarm of 1.01% in similar conditions. 
    more » « less
  5. GPS spoofing attacks are a severe threat to unmanned aerial vehicles. These attacks manipulate the true state of the unmanned aerial vehicles, potentially misleading the system without raising alarms. Several techniques, including machine learning, have been proposed to detect these attacks. Most of the studies applied machine learning models without identifying the best hyperparameters, using feature selection and importance techniques, and ensuring that the used dataset is unbiased and balanced. However, no current studies have discussed the impact of model parameters and dataset characteristics on the performance of machine learning models; therefore, this paper fills this gap by evaluating the impact of hyperparameters, regularization parameters, dataset size, correlated features, and imbalanced datasets on the performance of six most commonly known machine learning techniques. These models are Classification and Regression Decision Tree, Artificial Neural Network, Random Forest, Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine. Thirteen features extracted from legitimate and simulated GPS attack signals are used to perform this investigation. The evaluation was performed in terms of four metrics: accuracy, probability of misdetection, probability of false alarm, and probability of detection. The results indicate that hyperparameters, regularization parameters, correlated features, dataset size, and imbalanced datasets adversely affect a machine learning model’s performance. The results also show that the Classification and Regression Decision Tree classifier has an accuracy of 99.99%, a probability of detection of 99.98%, a probability of misdetection of 0.2%, and a probability of false alarm of 1.005%, after removing correlated features and using tuned parameters in a balanced dataset. Random Forest can achieve an accuracy of 99.94%, a probability of detection of 99.6%, a probability of misdetection of 0.4%, and a probability of false alarm of 1.01% in similar conditions. 
    more » « less