Disease modelers have been modeling progression of diseases for several decades using tools such as Markov Models or microsimulation. However, they need to address a serious challenge; many models they create are not reproducible. Moreover, there is no proper practice that ensures reproducible models, since modelers rely on loose guidelines that change periodically, rather than well-defined machine-readable standards. The Systems Biology Markup Language (SBML) is one such standard that allows exchange of models amongst different software tools. Recently, the SBML Arrays package has been developed, which extends the standard to allow handling simulation of populations. This paper demonstrates through several abstract examples how microsimulation disease models can be encoded using the SBML Arrays package enabling reproducible disease modeling.
more »
« less
BioSimulators: a central registry of simulation engines and services for recommending specific tools
Abstract Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations.
more »
« less
- Award ID(s):
- 1933453
- PAR ID:
- 10358721
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Date Published:
- Journal Name:
- Nucleic Acids Research
- Volume:
- 50
- Issue:
- W1
- ISSN:
- 0305-1048
- Page Range / eLocation ID:
- W108 to W114
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract Computational simulation experiments increasingly inform modern biological research, and bring with them the need to provide ways to annotate, archive, share and reproduce the experiments performed. These simulations increasingly require extensive collaboration among modelers, experimentalists, and engineers. The Minimum Information About a Simulation Experiment (MIASE) guidelines outline the information needed to share simulation experiments. SED-ML is a computer-readable format for the information outlined by MIASE, created as a community project and supported by many investigators and software tools. The first versions of SED-ML focused on deterministic and stochastic simulations of models. Level 1 Version 4 of SED-ML substantially expands these capabilities to cover additional types of models, model languages, parameter estimations, simulations and analyses of models, and analyses and visualizations of simulation results. To facilitate consistent practices across the community, Level 1 Version 4 also more clearly describes the use of SED-ML constructs, and includes numerous concrete validation rules. SED-ML is supported by a growing ecosystem of investigators, model languages, and software tools, including eight languages for constraint-based, kinetic, qualitative, rule-based, and spatial models, over 20 simulation tools, visual editors, model repositories, and validators. Additional information about SED-ML is available at https://sed-ml.org/ .more » « less
-
To maximize indoor daylight, design projects commonly use commercial optimization tools to find optimum window configurations. However, experiments show that such tools either fail to find the optimal solution or are very slow to compute in certain conditions.This paper presents a comparative analysis between a gradient-free optimization technique, Covariance Matrix Adaptation Evolution Strategy (CMA-ES), and the widely used Genetic Algorithm (GA)-based tool, Galapagos, to optimize window parameters to improve indoor daylight in six locations across different latitudes. A novel combination of daylight metrics, sDA, and ASE, is proposed for single-objective optimization comparison. Results indicate that GA in Galapagos takes progressively more time to converge, from 11 minutes in southernmost to 11 hours in northernmost latitudes, while runtime for CMA-ES is consistently around 2 hours. On average, CMA-ES is 1.5 times faster than Galapagos, while consistently producing optimal solutions. This paper can help researchers in selecting appropriate optimization algorithms for daylight simulation based on latitudes, runtime, and solution quality.more » « less
-
null (Ed.)ABSTRACT Halo models provide a simple and computationally inexpensive way to investigate the connection between galaxies and their dark matter haloes. However, these models rely on the assumption that the role of baryons can easily be parametrized in the modelling procedure. We aim to examine the ability of halo occupation distribution (HOD) modelling to reproduce the galaxy clustering found in two different hydrodynamic simulations, Illustris and EAGLE. For each simulation, we measure several galaxy clustering statistics on two different luminosity threshold samples. We then apply a simple five parameter HOD, which was fit to each simulation separately, to the corresponding dark matter-only simulations, and measure the same clustering statistics. We find that the halo mass function is shifted to lower masses in the hydrodynamic simulations, resulting in a galaxy number density that is too high when an HOD is applied to the dark matter-only simulation. However, the exact way in which baryons alter the mass function is remarkably different in the two simulations. After applying a correction to the halo mass function in each simulation, the HOD is able to accurately reproduce all clustering statistics for the high luminosity sample of galaxies. For the low luminosity sample, we find evidence that in addition to correcting the halo mass function, including spatial, velocity, and assembly bias parameters in the HOD is necessary to accurately reproduce clustering statistics.more » « less
-
Volunteer computing (VC) uses consumer digital electronics products, such as PCs, mobile devices, and game consoles, for high-throughput scientific computing. Device owners participate in VC by installing a program which, in the background, downloads and executes jobs from servers operated by science projects. Most VC projects use BOINC, an open-source middleware system for VC. BOINC allows scientists create and operate VC projects and enables volunteers to participate in these projects. Volunteers install a single application (the BOINC client) and then choose projects to support. We have developed a BOINC project, nanoHUB@home, to make use of VC in support of the nanoHUB science gateway. VC has greatly expanded the computational resources available for nanoHUB simulations. We are using VC to support “speculative exploration”, a model of computing that explores the input parameters of online simulation tools published through the nanoHUB gateway, pre-computing results that have not been requested by users. These results are stored in a cache, and when a user launches an interactive simulation our system first checks the cache. If the result is already available it is returned to the user immediately, leaving the computational resources free and not re-computing existing results. The cache is also useful for machine learning (ML) studies, building surrogate models for nanoHUB simulation tools that allow us to quickly estimate results before running an expensive simulation. VC resources also allow us to support uncertainty quantification (UQ) in nanoHUB simulation tools, to go beyond simulations and deliver real-world predictions. Models are typically simulated with precise input values, but real-world experiments involve imprecise values for device measurements, material properties, and stimuli. The imprecise values can be expressed as a probability distribution of values, such as a Gaussian distribution with a mean and standard deviation, or an actual distribution measured from experiments. Stochastic collocation methods can be used to predict the resulting outputs given a series of probability distributions for inputs. These computations require hundreds or thousands of simulation runs for each prediction. This workload is well-suited to VC, since the runs are completely separate, but the results of all runs are combined in a statistical analysis.more » « less
An official website of the United States government

