Systems with both quantitative and qualitative responses are widely encountered in many applications. Design of experiment methods are needed when experiments are conducted to study such systems. Classic experimental design methods are unsuitable here because they often focus on one type of response. In this paper, we develop a Bayesian D-optimal design method for experiments with one continuous and one binary response. Both noninformative and conjugate informative prior distributions on the unknown parameters are considered. The proposed design criterion has meaningful interpretations regarding the D-optimality for the models for both types of responses. An efficient point-exchange search algorithm is developed to construct the local D-optimal designs for given parameter values. Global D-optimal designs are obtained by accumulating the frequencies of the design points in local D-optimal designs, where the parameters are sampled from the prior distributions. The performances of the proposed methods are evaluated through two examples.
more »
« less
This content will become publicly available on March 26, 2026
MINE: a new way to design genetics experiments for discovery
The Maximally Informative Next Experiment or MINE is a new experimental design approach for experiments, such as those in omics, in which the number of effects or parameters p greatly exceeds the number of samples n (p > n). Classical experimental design presumes n > p for inference about parameters and its application to p > n can lead to over-fitting. To overcome p > n, MINE is an ensemble method, which makes predictions about future experiments from an existing ensemble of models consistent with available data in order to select the most informative next experiment. Its advantages are in exploration of the data for new relationships with n < p and being able to integrate smaller and more tractable experiments to replace adaptively one large classic experiment as discoveries are made. Thus, using MINE is model-guided and adaptive over time in a large omics study. Here, MINE is illustrated on two distinct multi-year experiments, one involving genetic networks in Neurospora crassa and a second one involving a Genome Wide Association Study or GWAS in Sorghum bicolor as a comparison to classic experimental design in an agricultural setting.
more »
« less
- Award ID(s):
- 2041546
- PAR ID:
- 10579041
- Editor(s):
- Ma, S
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Briefings in bioinformatics
- ISSN:
- 1467-5463
- Subject(s) / Keyword(s):
- ensemble methods MINE mixed linear models genetic networks biological clock
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract “Evolve and resequence” (E&R) studies combine experimental evolution and whole‐genome sequencing to interrogate the genetics underlying adaptation. Due to ease of handling, E&R work with asexual organisms such as bacteria can employ optimized experimental design, with large experiments and many generations of selection. By contrast, E&R experiments with sexually reproducing organisms are more difficult to implement, and design parameters vary dramatically among studies. Thus, efforts have been made to assess how these differences, such as number of independent replicates, or size of experimental populations, impact inference. We add to this work by investigating the role of time sampling—the number of discrete time points sequence data are collected from evolving populations. Using data from an E&R experiment with outcrossingSaccharomyces cerevisiaein which populations were sequenced 17 times over ~540 generations, we address the following questions: (a) Do more time points improve the ability to identify candidate regions underlying selection? And (b) does high‐resolution sampling provide unique insight into evolutionary processes driving adaptation? We find that while time sampling does not improve the ability to identify candidate regions, high‐resolution sampling does provide valuable opportunities to characterize evolutionary dynamics. Increased time sampling reveals three distinct trajectories for adaptive alleles: one consistent with classic population genetic theory (i.e., models assuming constant selection coefficients), and two where trajectories suggest more context‐dependent responses (i.e., models involving dynamic selection coefficients). We conclude that while time sampling has limited impact on candidate region identification, sampling eight or more time points has clear benefits for studying complex evolutionary dynamics.more » « less
-
Abstract How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration ofEscherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences.more » « less
-
Abstract Mathematical models are increasingly being developed and calibrated in tandem with data collection, empowering scientists to intervene in real time based on quantitative model predictions. Well-designed experiments can help augment the predictive power of a mathematical model but the question of when to collect data to maximize its utility for a model is non-trivial. Here we define data as model-informative if it results in a unique parametrization, assessed through the lens of practical identifiability. The framework we propose identifies an optimal experimental design (how much data to collect and when to collect it) that ensures parameter identifiability (permitting confidence in model predictions), while minimizing experimental time and costs. We demonstrate the power of the method by applying it to a modified version of a classic site-of-action pharmacokinetic/pharmacodynamic model that describes distribution of a drug into the tumor microenvironment (TME), where its efficacy is dependent on the level of target occupancy in the TME. In this context, we identify a minimal set of time points when data needs to be collected that robustly ensures practical identifiability of model parameters. The proposed methodology can be applied broadly to any mathematical model, allowing for the identification of a minimally sufficient experimental design that collects the most informative data.more » « less
-
null (Ed.)Next-generation scientific applications in various fields are experiencing a rapid transition from traditional experiment-based methodologies to large-scale computation-intensive simulations featuring complex numerical modeling with a large number of tunable parameters. Such model-based simulations generate colossal amounts of data, which are then processed and analyzed against experimental or observation data for parameter calibration and model validation. The sheer volume and complexity of such data, the large model-parameter space, and the intensive computation make it practically infeasible for domain experts to manually configure and tune hyperparameters for accurate modeling in complex and distributed computing environments. This calls for an online computational steering service to enable real-time multi-user interaction and automatic parameter tuning. Towards this goal, we design and develop a generic steering framework based on Bayesian Optimization (BO) and conduct theoretical performance analysis of the steering service. We present a case study with the Weather Research and Forecast (WRF) model, which illustrates the performance superiority of the BO-based tuning over other heuristic methods and manual settings of domain experts using regret analysis.more » « less