skip to main content

This content will become publicly available on July 18, 2024

Title: LinguaPhylo: A probabilistic model specification language for reproducible phylogenetic analyses

Phylogenetic models have become increasingly complex, and phylogenetic data sets have expanded in both size and richness. However, current inference tools lack a model specification language that can concisely describe a complete phylogenetic analysis while remaining independent of implementation details. We introduce a new lightweight and concise model specification language, ‘LPhy’, which is designed to be both human and machine-readable. A graphical user interface accompanies ‘LPhy’, allowing users to build models, simulate data, and create natural language narratives describing the models. These narratives can serve as the foundation for manuscript method sections. Additionally, we present a command-line interface for converting LPhy-specified models into analysis specification files (in XML format) compatible with the BEAST2 software platform. Collectively, these tools aim to enhance the clarity of descriptions and reporting of probabilistic models in phylogenetic studies, ultimately promoting reproducibility of results.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Kumar, Sudhir
Publisher / Repository:
Date Published:
Journal Name:
PLOS Computational Biology
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Importance

    The study highlights the potential of large language models, specifically GPT-3.5 and GPT-4, in processing complex clinical data and extracting meaningful information with minimal training data. By developing and refining prompt-based strategies, we can significantly enhance the models’ performance, making them viable tools for clinical NER tasks and possibly reducing the reliance on extensive annotated datasets.


    This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance.

    Materials and Methods

    We evaluated these models on 2 clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extraction shared task, and (2) to identify nervous system disorder-related adverse events from safety reports in the vaccine adverse event reporting system (VAERS). To improve the GPT models' performance, we developed a clinical task-specific prompt framework that includes (1) baseline prompts with task description and format specification, (2) annotation guideline-based prompts, (3) error analysis-based instructions, and (4) annotated samples for few-shot learning. We assessed each prompt's effectiveness and compared the models to BioClinicalBERT.


    Using baseline prompts, GPT-3.5 and GPT-4 achieved relaxed F1 scores of 0.634, 0.804 for MTSamples and 0.301, 0.593 for VAERS. Additional prompt components consistently improved model performance. When all 4 components were used, GPT-3.5 and GPT-4 achieved relaxed F1 socres of 0.794, 0.861 for MTSamples and 0.676, 0.736 for VAERS, demonstrating the effectiveness of our prompt framework. Although these results trail BioClinicalBERT (F1 of 0.901 for the MTSamples dataset and 0.802 for the VAERS), it is very promising considering few training samples are needed.


    The study’s findings suggest a promising direction in leveraging LLMs for clinical NER tasks. However, while the performance of GPT models improved with task-specific prompts, there's a need for further development and refinement. LLMs like GPT-4 show potential in achieving close performance to state-of-the-art models like BioClinicalBERT, but they still require careful prompt engineering and understanding of task-specific knowledge. The study also underscores the importance of evaluation schemas that accurately reflect the capabilities and performance of LLMs in clinical settings.


    While direct application of GPT models to clinical NER tasks falls short of optimal performance, our task-specific prompt framework, incorporating medical knowledge and training samples, significantly enhances GPT models' feasibility for potential clinical applications.

    more » « less
  2. Abstract Motivation

    This article presents libRoadRunner 2.0, an extensible, high-performance, cross-platform, open-source software library for the simulation and analysis of models expressed using the systems biology markup language (SBML).


    libRoadRunner is a self-contained library, able to run either as a component inside other tools via its C++, C and Python APIs, or interactively through its Python or Julia interface. libRoadRunner uses a custom just-in-time (JIT) compiler built on the widely used LLVM JIT compiler framework. It compiles SBML-specified models directly into native machine code for a large variety of processors, making it fast enough to simulate extremely large models or repeated runs in reasonable timeframes. libRoadRunner is flexible, supporting the bulk of the SBML specification (except for delay and non-linear algebraic equations) as well as several SBML extensions such as hierarchical composition and probability distributions. It offers multiple deterministic and stochastic integrators, as well as tools for steady-state, sensitivity, stability and structural analyses.

    Availability and implementation

    libRoadRunner binary distributions for Windows, Mac OS and Linux, Julia and Python bindings, source code and documentation are all available at, and Python bindings are also available via pip. The source code can be compiled for the supported systems as well as in principle any system supported by LLVM-13, such as ARM-based computers like the Raspberry Pi. The library is licensed under the Apache License Version 2.0.

    more » « less
  3. Abstract The human embryo is a complex structure that emerges and develops as a result of cell-level decisions guided by both intrinsic genetic programs and cell–cell interactions. Given limited accessibility and associated ethical constraints of human embryonic tissue samples, researchers have turned to the use of human stem cells to generate embryo models to study specific embryogenic developmental steps. However, to study complex self-organizing developmental events using embryo models, there is a need for computational and imaging tools for detailed characterization of cell-level dynamics at the single cell level. In this work, we obtained live cell imaging data from a human pluripotent stem cell (hPSC)-based epiblast model that can recapitulate the lumenal epiblast cyst formation soon after implantation of the human blastocyst. By processing imaging data with a Python pipeline that incorporates both cell tracking and event recognition with the use of a CNN-LSTM machine learning model, we obtained detailed temporal information of changes in cell state and neighborhood during the dynamic growth and morphogenesis of lumenal hPSC cysts. The use of this tool combined with reporter lines for cell types of interest will drive future mechanistic studies of hPSC fate specification in embryo models and will advance our understanding of how cell-level decisions lead to global organization and emergent phenomena. Insight, innovation, integration: Human pluripotent stem cells (hPSCs) have been successfully used to model and understand cellular events that take place during human embryogenesis. Understanding how cell–cell and cell–environment interactions guide cell actions within a hPSC-based embryo model is a key step in elucidating the mechanisms driving system-level embryonic patterning and growth. In this work, we present a robust video analysis pipeline that incorporates the use of machine learning methods to fully characterize the process of hPSC self-organization into lumenal cysts to mimic the lumenal epiblast cyst formation soon after implantation of the human blastocyst. This pipeline will be a useful tool for understanding cellular mechanisms underlying key embryogenic events in embryo models. 
    more » « less
  4. Abstract Motivation

    Developing biochemical models in systems biology is a complex, knowledge-intensive activity. Some modelers (especially novices) benefit from model development tools with a graphical user interface. However, as with the development of complex software, text-based representations of models provide many benefits for advanced model development. At present, the tools for text-based model development are limited, typically just a textual editor that provides features such as copy, paste, find, and replace. Since these tools are not “model aware,” they do not provide features for: (i) model building such as autocompletion of species names; (ii) model analysis such as hover messages that provide information about chemical species; and (iii) model translation to convert between model representations. We refer to these as BAT features.


    We present VSCode-Antimony, a tool for building, analyzing, and translating models written in the Antimony modeling language, a human readable representation of Systems Biology Markup Language (SBML) models. VSCode-Antimony is a source editor, a tool with language-aware features. For example, there is autocompletion of variable names to assist with model building, hover messages that aid in model analysis, and translation between XML and Antimony representations of SBML models. These features result from making VSCode-Antimony model-aware by incorporating several sophisticated capabilities: analysis of the Antimony grammar (e.g. to identify model symbols and their types); a query system for accessing knowledge sources for chemical species and reactions; and automatic conversion between different model representations (e.g. between Antimony and SBML).

    Availability and implementation

    VSCode-Antimony is available as an open source extension in the VSCode Marketplace Source code can be found at

    more » « less
  5. With the rapid growth of large language models, big data, and malicious online attacks, it has become increasingly important to have tools for anomaly detection that can distinguish machine from human, fair from unfair, and dangerous from safe. Prior work has shown that two-distribution (specified complexity) hypothesis tests are useful tools for such tasks, aiding in detecting bias in datasets and providing artificial agents with the ability to recognize artifacts that are likely to have been designed by humans and pose a threat. However, existing work on two-distribution hypothesis tests requires exact values for the specification function, which can often be costly or impossible to compute. In this work, we prove novel finite-sample bounds that allow for two-distribution hypothesis tests with only estimates of required quantities, such as specification function values. Significantly, the resulting bounds do not require knowledge of the true distribution, distinguishing them from traditional p-values. We apply our bounds to detect student cheating on multiple-choice tests, as an example where the exact specification function is unknown. We additionally apply our results to detect representational bias in machine-learning datasets and provide artificial agents with intention perception, showing that our results are consistent with prior work despite only requiring a finite sample of the space. Finally, we discuss additional applications and provide guidance for those applying these bounds to their own work. 
    more » « less