skip to main content

Search for: All records

Creators/Authors contains: "Ma, Terry"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Motivation

    Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human. Identification of novel microbial species and quantification of their distributional variations among different samples that are sequenced using next-generation-sequencing technology hold the key to the success of most metagenomic studies. To achieve these goals, we propose a simple yet powerful metagenomic binning method, MetaBMF. The method does not require prior knowledge of reference genomes and produces highly accurate results, even at a strain level. Thus, it can be broadly used to identify disease-related microbial organisms that are not well-studied.


    Mathematically, we count the number of mapped reads on each assembled genomic fragment cross different samples as our input matrix and propose a scalable stratified angle regression algorithm to factorize this count matrix into a product of a binary matrix and a nonnegative matrix. The binary matrix can be used to separate microbial species and the nonnegative matrix quantifies the species distributions in different samples. In simulation and empirical studies, we demonstrate that MetaBMF has a high binning accuracy. It can not only bin DNA fragments accurately at a species level but also at a strain level. As shown in our example, we can accuratelymore »identify the Shiga-toxigenic Escherichia coli O104: H4 strain which led to the 2011 German E.coli outbreak. Our efforts in these areas should lead to (i) fundamental advances in metagenomic binning, (ii) development and refinement of technology for the rapid identification and quantification of microbial distributions and (iii) finding of potential probiotics or reliable pathogenic bacterial strains.

    Availability and implementation

    The software is available at

    « less