Carrier concentration optimization has been an enduring challenge when developing newly discovered semiconductors for applications (e.g., thermoelectrics, transparent conductors, photovoltaics). This barrier has been particularly pernicious in the realm of high-throughput property prediction, where the carrier concentration is often assumed to be a free parameter and the limits are not predicted due to the high computational cost. In this work, we explore the application of machine learning for high-throughput carrier concentration range prediction. Bounding the model within diamond-like semiconductors, the learning set was developed from experimental carrier concentration data on 127 compounds ranging from unary to quaternary. The data were analyzed using various statistical and machine learning methods. Accurate predictions of carrier concentration ranges in diamond-like semiconductors are made within approximately one order of magnitude on average across both
- NSF-PAR ID:
- 10153904
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- npj Computational Materials
- Volume:
- 4
- Issue:
- 1
- ISSN:
- 2057-3960
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Diamond like semiconductors (DLS) have emerged as candidates for thermoelectric energy conversion. Towards understanding and optimizing performance, we present a comprehensive investigation of the electronic properties of two DLS phases, quaternary Cu 2 HgGeTe 4 and related ordered vacancy compound Hg 2 GeTe 4 , including thermodynamic stability, defect chemistry, and transport properties. To establish the thermodynamic link between the related but distinct phases, the stability region for both is visualized in chemical potential space. In spite of their similar structure and bonding, we show that the two materials exhibit reciprocal behaviors for dopability. Cu 2 HgGeTe 4 is degenerately p-type in all environments despite its wide stability region, due to the presence of low-energy acceptor defects V Cu and Cu Hg and is resistant to extrinsic n-type doping. Meanwhile Hg 2 GeTe 4 has a narrow stability region and intrinsic behavior due to the relatively high formation energy of native defects, but presents an opportunity for bi-polar doping. While these two compounds have similar structure, bonding, and chemical constituents, the reciprocal nature of their dopability emerges from significant differences in band edge positions. A Brouwer band diagram approach is utilized to visualize the role of native defects on carrier concentrations, dopability, and transport properties. This study elucidates the doping asymmetry between two solid-solution forming DLS phases Cu 2 HgGeTe 4 and Hg 2 GeTe 4 by revealing the defect chemistry of each compound, and suggests design strategies for defect engineering of DLS phases.more » « less
-
Abstract Ultrawide‐bandgap semiconductors such as AlN, BN, and diamond hold tremendous promise for high‐efficiency deep‐ultraviolet optoelectronics and high‐power/frequency electronics, but their practical application has been limited by poor current conduction. Through a combined theoretical and experimental study, it is shown that a critical challenge can be addressed for AlN nanostructures by using N‐rich epitaxy. Under N‐rich conditions, the p‐type Al‐substitutional Mg‐dopant formation energy is significantly reduced by 2 eV, whereas the formation energy for N‐vacancy related compensating defects is increased by ≈3 eV, both of which are essential to achieve high hole concentrations of AlN. Detailed analysis of the current−voltage characteristics of AlN p‐i‐n diodes suggests that current conduction is dominated by hole‐carrier tunneling at room temperature, which is directly related to the activation energy of Mg dopants. At high Mg concentrations, the dispersion of Mg acceptor energy levels leads to drastically reduced activation energy for a portion of Mg dopants, evidenced by the small tunneling energy of 67 meV, which explains the efficient current conduction and the very small turn‐on voltage (≈5 V) for the diodes made of nanoscale AlN. This work shows that nanostructures can overcome the dopability challenges of ultrawide‐bandgap semiconductors and significantly increase the efficiency of devices.
-
Abstract Machine learning (ML) has been applied to space weather problems with increasing frequency in recent years, driven by an influx of in-situ measurements and a desire to improve modeling and forecasting capabilities throughout the field. Space weather originates from solar perturbations and is comprised of the resulting complex variations they cause within the numerous systems between the Sun and Earth. These systems are often tightly coupled and not well understood. This creates a need for skillful models with knowledge about the confidence of their predictions. One example of such a dynamical system highly impacted by space weather is the thermosphere, the neutral region of Earth’s upper atmosphere. Our inability to forecast it has severe repercussions in the context of satellite drag and computation of probability of collision between two space objects in low Earth orbit (LEO) for decision making in space operations. Even with (assumed) perfect forecast of model drivers, our incomplete knowledge of the system results in often inaccurate thermospheric neutral mass density predictions. Continuing efforts are being made to improve model accuracy, but density models rarely provide estimates of confidence in predictions. In this work, we propose two techniques to develop nonlinear ML regression models to predict thermospheric density while providing robust and reliable uncertainty estimates: Monte Carlo (MC) dropout and direct prediction of the probability distribution, both using the negative logarithm of predictive density (NLPD) loss function. We show the performance capabilities for models trained on both local and global datasets. We show that the NLPD loss provides similar results for both techniques but the direct probability distribution prediction method has a much lower computational cost. For the global model regressed on the Space Environment Technologies High Accuracy Satellite Drag Model (HASDM) density database, we achieve errors of approximately 11% on independent test data with well-calibrated uncertainty estimates. Using an in-situ CHAllenging Minisatellite Payload (CHAMP) density dataset, models developed using both techniques provide test error on the order of 13%. The CHAMP models—on validation and test data—are within 2% of perfect calibration for the twenty prediction intervals tested. We show that this model can also be used to obtain global density predictions with uncertainties at a given epoch.
-
Abstract Background The topology of metabolic networks is both well-studied and remarkably well-conserved across many species. The regulation of these networks, however, is much more poorly characterized, though it is known to be divergent across organisms—two characteristics that make it difficult to model metabolic networks accurately. While many computational methods have been built to unravel transcriptional regulation, there have been few approaches developed for systems-scale analysis and study of metabolic regulation. Here, we present a stepwise machine learning framework that applies established algorithms to identify regulatory interactions in metabolic systems based on metabolic data: stepwise classification of unknown regulation, or SCOUR.
Results We evaluated our framework on both noiseless and noisy data, using several models of varying sizes and topologies to show that our approach is generalizable. We found that, when testing on data under the most realistic conditions (low sampling frequency and high noise), SCOUR could identify reaction fluxes controlled only by the concentration of a single metabolite (its primary substrate) with high accuracy. The positive predictive value (PPV) for identifying reactions controlled by the concentration of two metabolites ranged from 32 to 88% for noiseless data, 9.2 to 49% for either low sampling frequency/low noise or high sampling frequency/high noise data, and 6.6–27% for low sampling frequency/high noise data, with results typically sufficiently high for lab validation to be a practical endeavor. While the PPVs for reactions controlled by three metabolites were lower, they were still in most cases significantly better than random classification.
Conclusions SCOUR uses a novel approach to synthetically generate the training data needed to identify regulators of reaction fluxes in a given metabolic system, enabling metabolomics and fluxomics data to be leveraged for regulatory structure inference. By identifying and triaging the most likely candidate regulatory interactions, SCOUR can drastically reduce the amount of time needed to identify and experimentally validate metabolic regulatory interactions. As high-throughput experimental methods for testing these interactions are further developed, SCOUR will provide critical impact in the development of predictive metabolic models in new organisms and pathways.
-
Abstract In this study we present AI Prediction of Equatorial Plasma Bubbles (APE), a machine learning model that can accurately predict the Ionospheric Bubble Index (IBI) on the Swarm spacecraft. IBI is a correlation (
R 2) between perturbations in plasma density and the magnetic field, whose source can be Equatorial Plasma Bubbles (EPBs). EPBs have been studied for a number of years, but their day‐to‐day variability has made predicting them a considerable challenge. We build an ensemble machine learning model to predict IBI. We use data from 2014 to 2022 at a resolution of 1s, and transform it from a time‐series into a 6‐dimensional space with a corresponding EPBR 2(0–1) acting as the label. APE performs well across all metrics, exhibiting a skill, association and root mean squared error score of 0.96, 0.98 and 0.08 respectively. The model performs best post‐sunset, in the American/Atlantic sector, around the equinoxes, and when solar activity is high. This is promising because EPBs are most likely to occur during these periods. Shapley values reveal that F10.7 is the most important feature in driving the predictions, whereas latitude is the least. The analysis also examines the relationship between the features, which reveals new insights into EPB climatology. Finally, the selection of the features means that APE could be expanded to forecasting EPBs following additional investigations into their onset.