skip to main content


Title: GFF Utilities: GffRead and GffCompare
Summary: GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license  ( https://github.com/gpertea/gffread , https://github.com/gpertea/gffcompare ).  more » « less
Award ID(s):
1759518
NSF-PAR ID:
10155721
Author(s) / Creator(s):
;
Date Published:
Journal Name:
F1000Research
Volume:
9
ISSN:
2046-1402
Page Range / eLocation ID:
304
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance. At each reticulation, multiple lineages of a given locus are inherited from parental populations chosen at random, either independently across lineages or with positive correlation according to a Dirichlet process. This process may account for locus-specific probabilities of inheritance, for example. We implemented the simulation of gene trees under these network coalescent models in the Julia package PhyloCoalSimulations, which depends on PhyloNetworks and its powerful network manipulation tools. Input species phylogenies can be read in extended Newick format, either in numbers of generations or in coalescent units. Simulated gene trees can be written in Newick format, and in a way that preserves information about their embedding within the species network. This embedding can be used for downstream purposes, such as to simulate species-specific processes like rate variation across species, or for other scenarios as illustrated in this note. This package should be useful for simulation studies and simulation-based inference methods. The software is available open source with documentation and a tutorial at https://github.com/cecileane/PhyloCoalSimulations.jl.

     
    more » « less
  2. Robinson, Peter (Ed.)
    Abstract Motivation Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way. Results We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs. Availability and implementation ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Over the last two decades, robust optimization has emerged as a popular means to address decision-making problems affected by uncertainty. This includes single-stage and multi-stage problems involving real-valued and/or binary decisions and affected by exogenous (decision-independent) and/or endogenous (decision-dependent) uncertain parameters. Robust optimization techniques rely on duality theory potentially augmented with approximations to transform a (semi-)infinite optimization problem to a finite program, the robust counterpart. Whereas writing down the model for a robust optimization problem is usually a simple task, obtaining the robust counterpart requires expertise. To date, very few solutions are available that can facilitate the modeling and solution of such problems. This has been a major impediment to their being put to practical use. In this paper, we propose ROC++, an open-source C++ based platform for automatic robust optimization, applicable to a wide array of single-stage and multi-stage robust problems with both exogenous and endogenous uncertain parameters, that is easy to both use and extend. It also applies to certain classes of stochastic programs involving continuously distributed uncertain parameters and endogenous uncertainty. Our platform naturally extends existing off-the-shelf deterministic optimization platforms and offers ROPy, a Python interface in the form of a callable library, and the ROB file format for storing and sharing robust problems. We showcase the modeling power of ROC++ on several decision-making problems of practical interest. Our platform can help streamline the modeling and solution of stochastic and robust optimization problems for both researchers and practitioners. It comes with detailed documentation to facilitate its use and expansion. The latest version of ROC++ can be downloaded from https://sites.google.com/usc.edu/robust-opt-cpp/ . Summary of Contribution: The paper “ROC++: Robust Optimization in C++” proposes a new open-source C++ based platform for modeling, automatically reformulating, and solving robust optimization problems. ROC++ can address both single-stage and multi-stage problems involving exogenous and/or endogenous uncertain parameters and real- and/or binary-valued adaptive variables. The ROC++ modeling language is similar to the one provided for the deterministic case by state-of-the-art deterministic optimization solvers. ROC++ comes with detailed documentation to facilitate its use and expansion. It also offers ROPy, a Python interface in the form of a callable library. The latest version of ROC++ can be downloaded from https://sites.google.com/usc.edu/robust-opt-cpp/ . History: Accepted by Ted Ralphs, Area Editor for Software Tools. Funding: This material is based upon work supported by the National Science Foundation under Grant No. 1763108. This support is gratefully acknowledged. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplementary Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.1209 ) or is available from the IJOC GitHub software repository ( https://github.com/INFORMSJoC ) at ( https://dx.doi.org/10.5281/zenodo.6360996 ). 
    more » « less
  4. Abstract

    The performance of computational methods and software to identify differentially expressed features in single‐cell RNA‐sequencing (scRNA‐seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA‐seq expression features. To model the technological variability in cross‐platform scRNA‐seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA‐seq expression profiles across experimental platforms induced by platform‐ and gene‐specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero‐inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero‐inflated scRNA‐seq data with excessive zero counts. Using both synthetic and published plate‐ and droplet‐based scRNA‐seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state‐of‐the‐art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open‐source software (R/Bioconductor package) is available athttps://github.com/himelmallick/Tweedieverse.

     
    more » « less
  5. Abstract Motivation

    MicroRNAs (miRNAs) are small RNA molecules (∼22 nucleotide long) involved in post-transcriptional gene regulation. Advances in high-throughput sequencing technologies led to the discovery of isomiRs, which are miRNA sequence variants. While many miRNA-seq analysis tools exist, the diversity of output formats hinders accurate comparisons between tools and precludes data sharing and the development of common downstream analysis methods.

    Results

    To overcome this situation, we present here a community-based project, miRNA Transcriptomic Open Project (miRTOP) working towards the optimization of miRNA analyses. The aim of miRTOP is to promote the development of downstream isomiR analysis tools that are compatible with existing detection and quantification tools. Based on the existing GFF3 format, we first created a new standard format, mirGFF3, for the output of miRNA/isomiR detection and quantification results from small RNA-seq data. Additionally, we developed a command line Python tool, mirtop, to create and manage the mirGFF3 format. Currently, mirtop can convert into mirGFF3 the outputs of commonly used pipelines, such as seqbuster, isomiR-SEA, sRNAbench, Prost! as well as BAM files. Some tools have also incorporated the mirGFF3 format directly into their code, such as, miRge2.0, IsoMIRmap and OptimiR. Its open architecture enables any tool or pipeline to output or convert results into mirGFF3. Collectively, this isomiR categorization system, along with the accompanying mirGFF3 and mirtop API, provide a comprehensive solution for the standardization of miRNA and isomiR annotation, enabling data sharing, reporting, comparative analyses and benchmarking, while promoting the development of common miRNA methods focusing on downstream steps of miRNA detection, annotation and quantification.

    Availability and implementation

    https://github.com/miRTop/mirGFF3/ and https://github.com/miRTop/mirtop.

    Contact

    desvignes@uoneuro.uoregon.edu or lpantano@iscb.org

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less