skip to main content


Title: Adaptive Exploration and Optimization of Materials Crystal Structures
A central problem of materials science is to determine whether a hypothetical material is stable without being synthesized, which is mathematically equivalent to a global optimization problem on a highly nonlinear and multimodal potential energy surface (PES). This optimization problem poses multiple outstanding challenges, including the exceedingly high dimensionality of the PES, and that PES must be constructed from a reliable, sophisticated, parameters-free, and thus very expensive computational method, for which density functional theory (DFT) is an example. DFT is a quantum mechanics-based method that can predict, among other things, the total potential energy of a given configuration of atoms. DFT, although accurate, is computationally expensive. In this work, we propose a novel expansion-exploration-exploitation framework to find the global minimum of the PES. Starting from a few atomic configurations, this “known” space is expanded to construct a big candidate set. The expansion begins in a nonadaptive manner, where new configurations are added without their potential energy being considered. A novel feature of this step is that it tends to generate a space-filling design without the knowledge of the boundaries of the domain space. If needed, the nonadaptive expansion of the space of configurations is followed by adaptive expansion, where “promising regions” of the domain space (those with low-energy configurations) are further expanded. Once a candidate set of configurations is obtained, it is simultaneously explored and exploited using Bayesian optimization to find the global minimum. The methodology is demonstrated using a problem of finding the most stable crystal structure of aluminum. History: Kwok Tsui served as the senior editor for this article. Funding: The authors acknowledge a U.S. National Science Foundation Grant DMREF-1921873 and XSEDE through Grant DMR170031. Data Ethics & Reproducibility Note: The code capsule is available on Code Ocean at https://codeocean.com/capsule/3366149/tree and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2023.0028 ).  more » « less
Award ID(s):
1921873
NSF-PAR ID:
10443720
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
INFORMS Journal on Data Science
ISSN:
2694-4022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the sequential anomaly detection problem in the one-class setting when only the anomalous sequences are available and propose an adversarial sequential detector by solving a minimax problem to find an optimal detector against the worst-case sequences from a generator. The generator captures the dependence in sequential events using the marked point process model. The detector sequentially evaluates the likelihood of a test sequence and compares it with a time-varying threshold, also learned from data through the minimax problem. We demonstrate our proposed method’s good performance using numerical experiments on simulations and proprietary large-scale credit card fraud data sets. The proposed method can generally apply to detecting anomalous sequences. History: W. Nick Street served as the senior editor for this article. Funding: This work is partially supported by the National Science Foundation [Grants CAREER CCF-1650913, DMS-1938106, and DMS-1830210] and grant support from Macy’s Technology. Data Ethics & Reproducibility Note: The code capsule is available on Code Ocean at https://doi.org/10.24433/CO.2329910.v1 and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2023.0026 ). 
    more » « less
  2. Refractory high entropy alloys (RHEAs) have gained significant attention in recent years as potential replacements for Ni-based superalloys in gas turbine applications. Improving their properties, such as their high-temperature yield strength, is crucial to their success. Unfortunately, exploring this vast chemical space using exclusively experimental approaches is impractical due to the considerable cost of the synthesis, processing, and testing of candidate alloys, particularly at operation-relevant temperatures. On the other hand, the lack of reasonably accurate predictive property models, especially for high-temperature properties, makes traditional Integrated Computational Materials Engineering (ICME) methods inadequate. In this paper, we address this challenge by combining machine-learning models, easy-to-implement physics-based models, and inexpensive proxy experimental data to develop robust and fast-acting models using the concept of Bayesian updating. The framework combines data from one of the most comprehensive databases on RHEAs (Borg et al., 2020) with one of the most widely used physics-based strength models for BCC-based RHEAs (Maresca and Curtin, 2020) into a compact predictive model that is significantly more accurate than the state-of-the-art. This model is cross-validated, tested for physics-informed extrapolation, and rigorously benchmarked against standard Gaussian process regressors (GPRs) in a toy Bayesian optimization problem. Such a model can be used as a tool within ICME frameworks to screen for RHEAs with superior high-temperature properties. The code associated with this work is available at: https://codeocean.com/capsule/7849853/tree/v2. 
    more » « less
  3. Parameter calibration aims to estimate unobservable parameters used in a computer model by using physical process responses and computer model outputs. In the literature, existing studies calibrate all parameters simultaneously using an entire data set. However, in certain applications, some parameters are associated with only a subset of data. For example, in the building energy simulation, cooling (heating) season parameters should be calibrated using data collected during the cooling (heating) season only. This study provides a new multiblock calibration approach that considers such heterogeneity. Unlike existing studies that build emulators for the computer model response, such as the widely used Bayesian calibration approach, we consider multiple loss functions to be minimized, each for a block of parameters that use the corresponding data set, and estimate the parameters using a nonlinear optimization technique. We present the convergence properties under certain conditions and quantify the parameter estimation uncertainties. The superiority of our approach is demonstrated through numerical studies and a real-world building energy simulation case study.

    History: Bianca Maria Colosimo served as the senior editor for this article.

    Funding: This work was partially supported by the National Science Foundation [Grants CMMI-1662553, CMMI-2226348, and CBET-1804321].

    Data Ethics & Reproducibility Note: The code capsule is available on Code Ocean at https://codeocean.com/capsule/8623151/tree/v1 and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2023.0029 ).

     
    more » « less
  4. Over the last two decades, robust optimization has emerged as a popular means to address decision-making problems affected by uncertainty. This includes single-stage and multi-stage problems involving real-valued and/or binary decisions and affected by exogenous (decision-independent) and/or endogenous (decision-dependent) uncertain parameters. Robust optimization techniques rely on duality theory potentially augmented with approximations to transform a (semi-)infinite optimization problem to a finite program, the robust counterpart. Whereas writing down the model for a robust optimization problem is usually a simple task, obtaining the robust counterpart requires expertise. To date, very few solutions are available that can facilitate the modeling and solution of such problems. This has been a major impediment to their being put to practical use. In this paper, we propose ROC++, an open-source C++ based platform for automatic robust optimization, applicable to a wide array of single-stage and multi-stage robust problems with both exogenous and endogenous uncertain parameters, that is easy to both use and extend. It also applies to certain classes of stochastic programs involving continuously distributed uncertain parameters and endogenous uncertainty. Our platform naturally extends existing off-the-shelf deterministic optimization platforms and offers ROPy, a Python interface in the form of a callable library, and the ROB file format for storing and sharing robust problems. We showcase the modeling power of ROC++ on several decision-making problems of practical interest. Our platform can help streamline the modeling and solution of stochastic and robust optimization problems for both researchers and practitioners. It comes with detailed documentation to facilitate its use and expansion. The latest version of ROC++ can be downloaded from https://sites.google.com/usc.edu/robust-opt-cpp/ . Summary of Contribution: The paper “ROC++: Robust Optimization in C++” proposes a new open-source C++ based platform for modeling, automatically reformulating, and solving robust optimization problems. ROC++ can address both single-stage and multi-stage problems involving exogenous and/or endogenous uncertain parameters and real- and/or binary-valued adaptive variables. The ROC++ modeling language is similar to the one provided for the deterministic case by state-of-the-art deterministic optimization solvers. ROC++ comes with detailed documentation to facilitate its use and expansion. It also offers ROPy, a Python interface in the form of a callable library. The latest version of ROC++ can be downloaded from https://sites.google.com/usc.edu/robust-opt-cpp/ . History: Accepted by Ted Ralphs, Area Editor for Software Tools. Funding: This material is based upon work supported by the National Science Foundation under Grant No. 1763108. This support is gratefully acknowledged. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplementary Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.1209 ) or is available from the IJOC GitHub software repository ( https://github.com/INFORMSJoC ) at ( https://dx.doi.org/10.5281/zenodo.6360996 ). 
    more » « less
  5. The goal of molecular crystal structure prediction (CSP) is to find all the plausible polymorphs for a given molecule. This requires performing global optimization over a high-dimensional search space. Genetic algorithms (GAs) perform global optimization by starting from an initial population of structures and generating new candidate structures by breeding the fittest structures in the population. Typically, the fitness function is based on relative lattice energies, such that structures with lower energies have a higher probability of being selected for mating. GAs may be adapted to perform multi-modal optimization by using evolutionary niching methods that support the formation of several stable subpopulations and suppress the over-sampling of densely populated regions. Evolutionary niching is implemented in the GAtor molecular crystal structure prediction code by using techniques from machine learning to dynamically cluster the population into niches of structural similarity. A cluster-based fitness function is constructed such that structures in less populated clusters have a higher probability of being selected for breeding. Here, the effects of evolutionary niching are investigated for the crystal structure prediction of 1,3-dibromo-2-chloro-5-fluorobenzene. Using the cluster-based fitness function increases the success rate of generating the experimental structure and additional low-energy structures with similar packing motifs. 
    more » « less