Abstract MotivationThe scale and scope of comparative trait data are expanding at unprecedented rates, and recent advances in evolutionary modeling and simulation sometimes struggle to match this pace. Well-organized and flexible applications for conducting large-scale simulations of evolution hold promise in this context for understanding models and more so our ability to confidently estimate them with real trait data sampled from nature. ResultsWe introduce TraitTrainR, an R package designed to facilitate efficient, large-scale simulations under complex models of continuous trait evolution. TraitTrainR employs several output formats, supports popular trait data transformations, accommodates multi-trait evolution, and exhibits flexibility in defining input parameter space and model stacking. Moreover, TraitTrainR permits measurement error, allowing for investigation of its potential impacts on evolutionary inference. We envision a wealth of applications of TraitTrainR, and we demonstrate one such example by examining the problem of evolutionary model selection in three empirical phylogenetic case studies. Collectively, these demonstrations of applying TraitTrainR to explore problems in model selection underscores its utility and broader promise for addressing key questions, including those related to experimental design and statistical power, in comparative biology. Availability and implementationTraitTrainR is developed in R 4.4.0 and is freely available at https://github.com/radamsRHA/TraitTrainR/, which includes detailed documentation, quick-start guides, and a step-by-step tutorial.
more »
« less
ISRES+: an improved evolutionary strategy for function minimization to estimate the free parameters of systems biology models
Abstract MotivationMathematical models in systems biology help generate hypotheses, guide experimental design, and infer the dynamics of gene regulatory networks. These models are characterized by phenomenological or mechanistic parameters, which are typically hard to measure. Therefore, efficient parameter estimation is central to model development. Global optimization techniques, such as evolutionary algorithms (EAs), are applied to estimate model parameters by inverse modeling, i.e. calibrating models by minimizing a function that evaluates a measure of the error between model predictions and experimental data. EAs estimate model parameters “fittest individuals” by generating a large population of individuals using strategies like recombination and mutation over multiple “generations.” Typically, only a few individuals from each generation are used to create new individuals in the next generation. Improved Evolutionary Strategy by Stochastic Ranking (ISRES), proposed by Runnarson and Yao, is one such EA that is widely used in systems biology to estimate parameters. ISRES uses information at most from a pair of individuals in any generation to create a new population to minimize the error. In this article, we propose an efficient evolutionary strategy, ISRES+, which builds on ISRES by combining information from all individuals across the population and across all generations to develop a better understanding of the fitness landscape. ResultsISRES+ uses the additional information generated by the algorithm during evolution to approximate the local neighborhood around the best-fit individual using linear least squares fits in one and two dimensions, enabling efficient parameter estimation. ISRES+ outperforms ISRES and results in fitter individuals with a tighter distribution over multiple runs, such that a typical run of ISRES+ estimates parameters with a higher goodness-of-fit compared with ISRES. Availability and implementationAlgorithm and implementation: Github—https://github.com/gtreeves/isres-plus-bandodkar-2022.
more »
« less
- Award ID(s):
- 2105619
- PAR ID:
- 10479661
- Editor(s):
- Wren, Jonathan
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 39
- Issue:
- 7
- ISSN:
- 1367-4811
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract SummaryMolecular mechanisms of biological functions and disease processes are exceptionally complex, and our ability to interrogate and understand relationships is becoming increasingly dependent on the use of computational modeling. We have developed “BioModME,” a standalone R-based web application package, providing an intuitive and comprehensive graphical user interface to help investigators build, solve, visualize, and analyze computational models of complex biological systems. Some important features of the application package include multi-region system modeling, custom reaction rate laws and equations, unit conversion, model parameter estimation utilizing experimental data, and import and export of model information in the Systems Biology Matkup Language format. The users can also export models to MATLAB, R, and Python languages and the equations to LaTeX and Mathematical Markup Language formats. Other important features include an online model development platform, multi-modality visualization tool, and efficient numerical solvers for differential-algebraic equations and optimization. Availability and implementationAll relevant software information including documentation and tutorials can be found at https://mcw.marquette.edu/biomedical-engineering/computational-systems-biology-lab/biomodme.php. Deployed software can be accessed at https://biomodme.ctsi.mcw.edu/. Source code is freely available for download at https://github.com/MCWComputationalBiologyLab/BioModME.more » « less
-
Abstract The estimation of demographic parameters is a key component of evolutionary demography and conservation biology. Capture–mark–recapture methods have served as a fundamental tool for estimating demographic parameters. The accurate estimation of demographic parameters in capture–mark–recapture studies depends on accurate modeling of the observation process. Classic capture–mark–recapture models typically model the observation process as a Bernoulli or categorical trial with detection probability conditional on a marked individual's availability for detection (e.g., alive, or alive and present in a study area). Alternatives to this approach are underused, but may have great utility in capture–recapture studies. In this paper, we explore a simple concept:in the same way that counts contain more information about abundance than simple detection/non‐detection data, the number of encounters of individuals during observation occasions contains more information about the observation process than detection/non‐detection data for individuals during the same occasion. Rather than using Bernoulli or categorical distributions to estimate detection probability, we demonstrate the application of zero‐inflated Poisson and gamma‐Poisson distributions. The use of count distributions allows for inference on availability for encounter, as well as a wide variety of parameterizations for heterogeneity in the observation process. We demonstrate that this approach can accurately recover demographic and observation parameters in the presence of individual heterogeneity in detection probability and discuss some potential future extensions of this method.more » « less
-
Abstract MotivationHeritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. ResultsWe propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity O(NMB) for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to O(NMBmax( log3N, log3M)).We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. Availability and implementationThe RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.more » « less
-
PurposeTo improve the performance of neural networks for parameter estimation in quantitative MRI, in particular when the noise propagation varies throughout the space of biophysical parameters. Theory and MethodsA theoretically well‐founded loss function is proposed that normalizes the squared error of each estimate with respective Cramér–Rao bound (CRB)—a theoretical lower bound for the variance of an unbiased estimator. This avoids a dominance of hard‐to‐estimate parameters and areas in parameter space, which are often of little interest. The normalization with corresponding CRB balances the large errors of fundamentally more noisy estimates and the small errors of fundamentally less noisy estimates, allowing the network to better learn to estimate the latter. Further, proposed loss function provides an absolute evaluation metric for performance: A network has an average loss of 1 if it is a maximally efficient unbiased estimator, which can be considered the ideal performance. The performance gain with proposed loss function is demonstrated at the example of an eight‐parameter magnetization transfer model that is fitted to phantom and in vivo data. ResultsNetworks trained with proposed loss function perform close to optimal, that is, their loss converges to approximately 1, and their performance is superior to networks trained with the standard mean‐squared error (MSE). The proposed loss function reduces the bias of the estimates compared to the MSE loss, and improves the match of the noise variance to the CRB. This performance gain translates to in vivo maps that align better with the literature. ConclusionNormalizing the squared error with the CRB during the training of neural networks improves their performance in estimating biophysical parameters.more » « less
An official website of the United States government

