skip to main content

Title: Empirical modeling of dopability in diamond-like semiconductors

Carrier concentration optimization has been an enduring challenge when developing newly discovered semiconductors for applications (e.g., thermoelectrics, transparent conductors, photovoltaics). This barrier has been particularly pernicious in the realm of high-throughput property prediction, where the carrier concentration is often assumed to be a free parameter and the limits are not predicted due to the high computational cost. In this work, we explore the application of machine learning for high-throughput carrier concentration range prediction. Bounding the model within diamond-like semiconductors, the learning set was developed from experimental carrier concentration data on 127 compounds ranging from unary to quaternary. The data were analyzed using various statistical and machine learning methods. Accurate predictions of carrier concentration ranges in diamond-like semiconductors are made within approximately one order of magnitude on average across bothp- andn-type dopability. The model fit to empirical data is analyzed to understand what drives trends in carrier concentration and compared with previous computational efforts. Finally, dopability predictions from this model are combined with high-throughput quality factor predictions to identify promising thermoelectric materials.

; ; ; ; ;
Award ID(s):
1729594 1729487
Publication Date:
Journal Name:
npj Computational Materials
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Machine learning (ML) has been applied to space weather problems with increasing frequency in recent years, driven by an influx of in-situ measurements and a desire to improve modeling and forecasting capabilities throughout the field. Space weather originates from solar perturbations and is comprised of the resulting complex variations they cause within the numerous systems between the Sun and Earth. These systems are often tightly coupled and not well understood. This creates a need for skillful models with knowledge about the confidence of their predictions. One example of such a dynamical system highly impacted by space weather is the thermosphere, the neutral region of Earth’s upper atmosphere. Our inability to forecast it has severe repercussions in the context of satellite drag and computation of probability of collision between two space objects in low Earth orbit (LEO) for decision making in space operations. Even with (assumed) perfect forecast of model drivers, our incomplete knowledge of the system results in often inaccurate thermospheric neutral mass density predictions. Continuing efforts are being made to improve model accuracy, but density models rarely provide estimates of confidence in predictions. In this work, we propose two techniques to develop nonlinear ML regression models to predictmore »thermospheric density while providing robust and reliable uncertainty estimates: Monte Carlo (MC) dropout and direct prediction of the probability distribution, both using the negative logarithm of predictive density (NLPD) loss function. We show the performance capabilities for models trained on both local and global datasets. We show that the NLPD loss provides similar results for both techniques but the direct probability distribution prediction method has a much lower computational cost. For the global model regressed on the Space Environment Technologies High Accuracy Satellite Drag Model (HASDM) density database, we achieve errors of approximately 11% on independent test data with well-calibrated uncertainty estimates. Using an in-situ CHAllenging Minisatellite Payload (CHAMP) density dataset, models developed using both techniques provide test error on the order of 13%. The CHAMP models—on validation and test data—are within 2% of perfect calibration for the twenty prediction intervals tested. We show that this model can also be used to obtain global density predictions with uncertainties at a given epoch.

    « less
  2. Diamond like semiconductors (DLS) have emerged as candidates for thermoelectric energy conversion. Towards understanding and optimizing performance, we present a comprehensive investigation of the electronic properties of two DLS phases, quaternary Cu 2 HgGeTe 4 and related ordered vacancy compound Hg 2 GeTe 4 , including thermodynamic stability, defect chemistry, and transport properties. To establish the thermodynamic link between the related but distinct phases, the stability region for both is visualized in chemical potential space. In spite of their similar structure and bonding, we show that the two materials exhibit reciprocal behaviors for dopability. Cu 2 HgGeTe 4 is degenerately p-type in all environments despite its wide stability region, due to the presence of low-energy acceptor defects V Cu and Cu Hg and is resistant to extrinsic n-type doping. Meanwhile Hg 2 GeTe 4 has a narrow stability region and intrinsic behavior due to the relatively high formation energy of native defects, but presents an opportunity for bi-polar doping. While these two compounds have similar structure, bonding, and chemical constituents, the reciprocal nature of their dopability emerges from significant differences in band edge positions. A Brouwer band diagram approach is utilized to visualize the role of native defects on carriermore »concentrations, dopability, and transport properties. This study elucidates the doping asymmetry between two solid-solution forming DLS phases Cu 2 HgGeTe 4 and Hg 2 GeTe 4 by revealing the defect chemistry of each compound, and suggests design strategies for defect engineering of DLS phases.« less
  3. Abstract Background

    The topology of metabolic networks is both well-studied and remarkably well-conserved across many species. The regulation of these networks, however, is much more poorly characterized, though it is known to be divergent across organisms—two characteristics that make it difficult to model metabolic networks accurately. While many computational methods have been built to unravel transcriptional regulation, there have been few approaches developed for systems-scale analysis and study of metabolic regulation. Here, we present a stepwise machine learning framework that applies established algorithms to identify regulatory interactions in metabolic systems based on metabolic data: stepwise classification of unknown regulation, or SCOUR.


    We evaluated our framework on both noiseless and noisy data, using several models of varying sizes and topologies to show that our approach is generalizable. We found that, when testing on data under the most realistic conditions (low sampling frequency and high noise), SCOUR could identify reaction fluxes controlled only by the concentration of a single metabolite (its primary substrate) with high accuracy. The positive predictive value (PPV) for identifying reactions controlled by the concentration of two metabolites ranged from 32 to 88% for noiseless data, 9.2 to 49% for either low sampling frequency/low noise or high sampling frequency/high noisemore »data, and 6.6–27% for low sampling frequency/high noise data, with results typically sufficiently high for lab validation to be a practical endeavor. While the PPVs for reactions controlled by three metabolites were lower, they were still in most cases significantly better than random classification.


    SCOUR uses a novel approach to synthetically generate the training data needed to identify regulators of reaction fluxes in a given metabolic system, enabling metabolomics and fluxomics data to be leveraged for regulatory structure inference. By identifying and triaging the most likely candidate regulatory interactions, SCOUR can drastically reduce the amount of time needed to identify and experimentally validate metabolic regulatory interactions. As high-throughput experimental methods for testing these interactions are further developed, SCOUR will provide critical impact in the development of predictive metabolic models in new organisms and pathways.

    « less
  4. Abstract

    The discovery and development of ultra-wide bandgap (UWBG) semiconductors is crucial to accelerate the adoption of renewable power sources. This necessitates an UWBG semiconductor that exhibits robust doping with high carrier mobility over a wide range of carrier concentrations. Here we demonstrate that epitaxial thin films of the perovskite oxide NdxSr1xSnO3(SSO) do exactly this. Nd is used as a donor to successfully modulate the carrier concentration over nearly two orders of magnitude, from 3.7 × 1018 cm−3to 2.0 × 1020 cm−3. Despite being grown on lattice-mismatched substrates and thus having relatively high structural disorder, SSO films exhibited the highest room-temperature mobility, ~70 cm2 V−1 s−1, among all known UWBG semiconductors in the range of carrier concentrations studied. The phonon-limited mobility is calculated from first principles and supplemented with a model to treat ionized impurity and Kondo scattering. This produces excellent agreement with experiment over a wide range of temperatures and carrier concentrations, and predicts the room-temperature phonon-limited mobility to be 76–99 cm2 V−1 s−1depending on carrier concentration. This work establishes a perovskite oxide as an emerging UWBG semiconductor candidate with potential for applications in power electronics.

  5. Abstract

    Due to climate change and rapid urbanization, Urban Heat Island (UHI), featuring significantly higher temperature in metropolitan areas than surrounding areas, has caused negative impacts on urban communities. Temporal granularity is often limited in UHI studies based on satellite remote sensing data that typically has multi-day frequency coverage of a particular urban area. This low temporal frequency has restricted the development of models for predicting UHI. To resolve this limitation, this study has developed a cyber-based geographic information science and systems (cyberGIS) framework encompassing multiple machine learning models for predicting UHI with high-frequency urban sensor network data combined with remote sensing data focused on Chicago, Illinois, from 2018 to 2020. Enabled by rapid advances in urban sensor network technologies and high-performance computing, this framework is designed to predict UHI in Chicago with fine spatiotemporal granularity based on environmental data collected with the Array of Things (AoT) urban sensor network and Landsat-8 remote sensing imagery. Our computational experiments revealed that a random forest regression (RFR) model outperforms other models with the prediction accuracy of 0.45 degree Celsius in 2020 and 0.8 degree Celsius in 2018 and 2019 with mean absolute error as the evaluation metric. Humidity, distance to geographic center, and PM2.5concentrationmore »are identified as important factors contributing to the model performance. Furthermore, we estimate UHI in Chicago with 10-min temporal frequency and 1-km spatial resolution on the hottest day in 2018. It is demonstrated that the RFR model can accurately predict UHI at fine spatiotemporal scales with high-frequency urban sensor network data integrated with satellite remote sensing data.

    « less