skip to main content


Search for: All records

Creators/Authors contains: "Mukherjee, Sumit"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Riboswitches are conserved structural ribonucleic acid (RNA) sensors that are mainly found to regulate a large number of genes/operons in bacteria. Presently, >50 bacterial riboswitch classes have been discovered, but only the thiamine pyrophosphate riboswitch class is detected in a few eukaryotes like fungi, plants and algae. One of the most important challenges in riboswitch research is to discover existing riboswitch classes in eukaryotes and to understand the evolution of bacterial riboswitches. However, traditional search methods for riboswitch detection have failed to detect eukaryotic riboswitches besides just one class and any distant structural homologs of riboswitches. We developed a novel approach based on inverse RNA folding that attempts to find sequences that match the shape of the target structure with minimal sequence conservation based on key nucleotides that interact directly with the ligand. Then, to support our matched candidates, we expanded the results into a covariance model representing similar sequences preserving the structure. Our method transforms a structure-based search into a sequence-based search that considers the conservation of secondary structure shape and ligand-binding residues. This method enables us to identify a potential structural candidate in fungi that could be the distant homolog of bacterial purine riboswitches. Further, phylogenomic analysis and evolutionary distribution of this structural candidate indicate that the most likely point of origin of this structural candidate in these organisms is associated with the loss of traditional purine riboswitches. The computational approach could be applicable to other domains and problems in RNA research.

     
    more » « less
  2. null (Ed.)
  3. null (Ed.)
    AI for good (AI4G) projects involve developing and applying ar- tificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Developing and deploying such solutions must be done in collab- oration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Based on our experiences, we detail the different aspects of this type of collaboration broken down into four high-level cat- egories: communication, data, modeling, and impact, and distill eleven takeaways to guide such projects in the future. We briefly describe two case studies to illustrate how some of these takeaways were applied in practice during our past collaborations. 
    more » « less
  4. null (Ed.)
  5. Abstract Motivation

    Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-cell implies a highly sparse sample of the true cellular transcriptome. (ii) Many tools simply cannot handle the size of the resulting datasets. (iii) Prior biological knowledge such as bulk RNA-seq information of certain cell types or qualitative marker information is not taken into account. Here we present UNCURL, a preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that is able to handle varying sampling distributions, scales to very large cell numbers and can incorporate prior knowledge.

    Results

    We find that preprocessing using UNCURL consistently improves performance of commonly used scRNA-seq tools for clustering, visualization and lineage estimation, both in the absence and presence of prior knowledge. Finally we demonstrate that UNCURL is extremely scalable and parallelizable, and runs faster than other methods on a scRNA-seq dataset containing 1.3 million cells.

    Availability and implementation

    Source code is available at https://github.com/yjzhang/uncurl_python.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  6. LetT(K1,r,Gn) be the number of monochromatic copies of ther‐starK1,rin a uniformly random coloring of the vertices of the graphGn. In this paper we provide a complete characterization of the limiting distribution ofT(K1,r,Gn), in the regime whereis bounded, for any growing sequence of graphsGn. The asymptotic distribution is a sum of mutually independent components, each term of which is a polynomial of a single Poisson random variable of degree at mostr. Conversely, any limiting distribution ofT(K1,r,Gn) has a representation of this form. Examples and connections to the birthday problem are discussed.

     
    more » « less