skip to main content

Title: Plausible Screening Using Functional Properties for Simulations with Large Solution Spaces
When working with models that allow for many candidate solutions, simulation practitioners can benefit from screening out unacceptable solutions in a statistically controlled way. However, for large solution spaces, estimating the performance of all solutions through simulation can prove impractical. We propose a statistical framework for screening solutions even when only a relatively small subset of them is simulated. Our framework derives its superiority over exhaustive screening approaches by leveraging available properties of the function that describes the performance of solutions. The framework is designed to work with a wide variety of available functional information and provides guarantees on both the confidence and consistency of the resulting screening inference. We provide explicit formulations for the properties of convexity and Lipschitz continuity and show through numerical examples that our procedures can efficiently screen out many unacceptable solutions.  more » « less
Award ID(s):
1854562 1953111
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Operations Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The world faces an increasing need to phase out harmful chemicals and design sustainable alternatives across various consumer products and industrial applications. Alternatives assessment is an emerging field with focus on identifying viable solutions to substitute harmful chemicals. However, current methods fail to consider trade-offs from human and ecosystem exposures, and from impacts associated with chemical supply chains and product life cycles. To close this gap, we propose a life cycle based alternatives assessment (LCAA) framework for consistently integrating quantitative exposure and life cycle impact performance in the substitution process. We start with a pre-screening based on function-related decision rules, followed by three progressive tiers from (1) rapid risk screening of various alternatives for the consumer use stage, to (2) an assessment of chemical supply chain impacts for selected alternatives with substantially different synthesis routes, and (3) an assessment of product life cycle impacts for alternatives with substantially different product life cycles. Each tier focuses on relevant impacts and uses streamlined assessment methods. While the initial risk screening will be sufficient for evaluating chemicals with similar supply chains, each additional tier helps further restricting the number of viable solutions, while avoiding unacceptable trade-offs. We test our LCAA framework in a proof-of-concept case study for identifying suitable alternatives to a harmful plasticizer in household flooring. Results show that the use stage dominates human health impacts across alternatives, supporting that a rapid risk screening is sufficient unless very different supply chains or a broader set of alternative materials or technologies are considered. Combined with currently used indicators for technical and economic performance, our LCAA framework is suitable for informing function-based substitution at the level of chemicals, materials and product applications to foster green and sustainable chemistry solutions. 
    more » « less
  2. We propose a framework and specific algorithms for screening a large (perhaps countably infinite) spaceof feasible solutions to generate a subset containing the optimal solution with high confidence. We attainthis goal even when only a small fraction of the feasible solutions are simulated. To accomplish it weexploit structural information about the space of functions within which the true objective function lies, andthen assess how compatible optimality is for each feasible solution with respect to the observed simulation outputs and the assumed function space. The result is a set of plausible optima. This approach can be viewed as a way to avoid slow simulation by leveraging fast optimization. Explicit formulations of the general approach are provided when the space of functions is either Lipschitz or convex. We establish both small- and large-sample properties of the approach, and provide two numerical examples. 
    more » « less
  3. Skolnick, Jeffrey (Ed.)
    Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain “dark”—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space. 
    more » « less
  4. Successful HPC software applications are long-lived. When ported across machines and their compilers, these applications often produce different numerical results, many of which are unacceptable. Such variability is also a concern while optimizing the code more aggressively to gain performance. Efficient tools that help locate the program units (files and functions) within which most of the variability occurs are badly needed, both to plan for code ports and to root-cause errors due to variability when they happen in the field. In this work, we offer an enhanced version of the open-source testing framework FLiT to serve these roles. Key new features of FLiT include a suite of bisection algorithms that help locate the root causes of variability. Another added feature allows an analysis of the tradeoffs between performance and the degree of variability. Our new contributions also include a collection of case studies. Results on the MFEM finite-element library include variability/performance tradeoffs, and the identification of a (hitherto unknown) abnormal level of result-variability even under mild compiler optimizations. Results from studying the Laghos proxy application include identifying a significantly divergent floating-point result-variability and successful root-causing down to the problematic function over as little as 14 program executions. Finally, in an evaluation of 4,376 controlled injections of floating-point perturbations on the LULESH proxy application, we showed that the FLiT framework has 100% precision and recall in discovering the file and function locations of the injections all within an average of only 15 program executions. 
    more » « less
  5. We study two-stage stochastic optimization problems with random recourse, where the coefficients of the adaptive decisions involve uncertain parameters. To deal with the infinite-dimensional recourse decisions, we propose a scalable approximation scheme via piecewise linear and piecewise quadratic decision rules. We develop a data-driven distributionally robust framework with two layers of robustness to address distributional uncertainty. We also establish out-of-sample performance guarantees for the proposed scheme. Applying known ideas, the resulting optimization problem can be reformulated as an exact copositive program that admits semidefinite programming approximations. We design an iterative decomposition algorithm, which converges under some regularity conditions, to reduce the runtime needed to solve this program. Through numerical examples for various known operations management applications, we demonstrate that our method produces significantly better solutions than the traditional sample-average approximation scheme especially when the data are limited. For the problem instances for which only the recourse cost coefficients are random, our method exhibits slightly inferior out-of-sample performance but shorter runtimes compared with a competing approach. 
    more » « less