skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Editors contains: "Xu, Jinbo"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Xu, Jinbo (Ed.)
    Abstract Motivation Expanding our knowledge of small molecules beyond what is known in nature or designed in wet laboratories promises to significantly advance cheminformatics, drug discovery, biotechnology and material science. In silico molecular design remains challenging, primarily due to the complexity of the chemical space and the non-trivial relationship between chemical structures and biological properties. Deep generative models that learn directly from data are intriguing, but they have yet to demonstrate interpretability in the learned representation, so we can learn more about the relationship between the chemical and biological space. In this article, we advance research on disentangled representation learning for small molecule generation. We build on recent work by us and others on deep graph generative frameworks, which capture atomic interactions via a graph-based representation of a small molecule. The methodological novelty is how we leverage the concept of disentanglement in the graph variational autoencoder framework both to generate biologically relevant small molecules and to enhance model interpretability. Results Extensive qualitative and quantitative experimental evaluation in comparison with state-of-the-art models demonstrate the superiority of our disentanglement framework. We believe this work is an important step to address key challenges in small molecule generation with deep generative frameworks. Availability and implementation Training and generated data are made available at https://ieee-dataport.org/documents/dataset-disentangled-representation-learning-interpretable-molecule-generation. All code is made available at https://anonymous.4open.science/r/D-MolVAE-2799/. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. Xu, Jinbo (Ed.)
    Abstract Motivation Deep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue–residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue–residue contacts in homodimers from residue–residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue–residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features. Results Tested on three homodimer test datasets (Homo_std dataset, DeepHomo dataset and CASP-CAPRI dataset), the precision of DRCon for top L/5 interchain contact predictions (L: length of monomer in a homodimer) is 43.46%, 47.10% and 33.50% respectively at 6 Å contact threshold, which is substantially better than DeepHomo and DNCON2_inter and similar to Glinter. Moreover, our experiments demonstrate that using predicted tertiary structure or intrachain contacts of monomers in the unbound state as input, DRCon still performs well, even though its accuracy is lower than using true tertiary structures in the bound state are used as input. Finally, our case study shows that good interchain contact predictions can be used to build high-accuracy quaternary structure models of homodimers. Availability and implementation The source code of DRCon is available at https://github.com/jianlin-cheng/DRCon. The datasets are available at https://zenodo.org/record/5998532#.YgF70vXMKsB. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Xu, Jinbo (Ed.)
    Abstract Motivation Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that enables the visualization of subcellular structures in situ at near-atomic resolution. Cellular cryo-ET images help in resolving the structures of macromolecules and determining their spatial relationship in a single cell, which has broad significance in cell and structural biology. Subtomogram classification and recognition constitute a primary step in the systematic recovery of these macromolecular structures. Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification, but suffer from limited applicability due to scarcity of annotated data. While generating simulated data for training supervised models is a potential solution, a sizeable difference in the image intensity distribution in generated data as compared with real experimental data will cause the trained models to perform poorly in predicting classes on real subtomograms. Results In this work, we present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification. We use unsupervised multi-adversarial domain adaption to reduce the domain shift between features of simulated and experimental data. We develop a network-driven domain randomization procedure with ‘warp’ modules to alter the simulated data and help the classifier generalize better on experimental data. We do not use any labeled experimental data to train our model, whereas some of the existing alternative approaches require labeled experimental samples for cross-domain classification. Nevertheless, Cryo-Shift outperforms the existing alternative approaches in cross-domain subtomogram classification in extensive evaluation studies demonstrated herein using both simulated and experimental data. Availabilityand implementation https://github.com/xulabs/aitom. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. Xu, Jinbo (Ed.)
    Abstract Motivation Motions of transmembrane receptors on cancer cell surfaces can reveal biophysical features of the cancer cells, thus providing a method for characterizing cancer cell phenotypes. While conventional analysis of receptor motions in the cell membrane mostly relies on the mean-squared displacement plots, much information is lost when producing these plots from the trajectories. Here we employ deep learning to classify breast cancer cell types based on the trajectories of epidermal growth factor receptor (EGFR). Our model is an artificial neural network trained on the EGFR motions acquired from six breast cancer cell lines of varying invasiveness and receptor status: MCF7 (hormone receptor positive), BT474 (HER2-positive), SKBR3 (HER2-positive), MDA-MB-468 (triple negative, TN), MDA-MB-231 (TN) and BT549 (TN). Results The model successfully classified the trajectories within individual cell lines with 83% accuracy and predicted receptor status with 85% accuracy. To further validate the method, epithelial–mesenchymal transition (EMT) was induced in benign MCF10A cells, noninvasive MCF7 cancer cells and highly invasive MDA-MB-231 cancer cells, and EGFR trajectories from these cells were tested. As expected, after EMT induction, both MCF10A and MCF7 cells showed higher rates of classification as TN cells, but not the MDA-MB-231 cells. Whereas deep learning-based cancer cell classifications are primarily based on the optical transmission images of cell morphology and the fluorescence images of cell organelles or cytoskeletal structures, here we demonstrated an alternative way to classify cancer cells using a dynamic, biophysical feature that is readily accessible. Availability and implementation A python implementation of deep learning-based classification can be found at https://github.com/soonwoohong/Deep-learning-for-EGFR-trajectory-classification. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  5. Xu, Jinbo (Ed.)
    Abstract Motivation Cryo-Electron Tomography (cryo-ET) is a 3D bioimaging tool that visualizes the structural and spatial organization of macromolecules at a near-native state in single cells, which has broad applications in life science. However, the systematic structural recognition and recovery of macromolecules captured by cryo-ET are difficult due to high structural complexity and imaging limits. Deep learning-based subtomogram classification has played critical roles for such tasks. As supervised approaches, however, their performance relies on sufficient and laborious annotation on a large training dataset. Results To alleviate this major labeling burden, we proposed a Hybrid Active Learning (HAL) framework for querying subtomograms for labeling from a large unlabeled subtomogram pool. Firstly, HAL adopts uncertainty sampling to select the subtomograms that have the most uncertain predictions. This strategy enforces the model to be aware of the inductive bias during classification and subtomogram selection, which satisfies the discriminativeness principle in AL literature. Moreover, to mitigate the sampling bias caused by such strategy, a discriminator is introduced to judge if a certain subtomogram is labeled or unlabeled and subsequently the model queries the subtomogram that have higher probabilities to be unlabeled. Such query strategy encourages to match the data distribution between the labeled and unlabeled subtomogram samples, which essentially encodes the representativeness criterion into the subtomogram selection process. Additionally, HAL introduces a subset sampling strategy to improve the diversity of the query set, so that the information overlap is decreased between the queried batches and the algorithmic efficiency is improved. Our experiments on subtomogram classification tasks using both simulated and real data demonstrate that we can achieve comparable testing performance (on average only 3% accuracy drop) by using less than 30% of the labeled subtomograms, which shows a very promising result for subtomogram classification task with limited labeling resources. Availability and implementation https://github.com/xulabs/aitom. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  6. Xu, Jinbo (Ed.)
    Abstract Summary Spectroscopic single-molecule localization microscopy (sSMLM) simultaneously captures the spatial locations and full spectra of stochastically emitting fluorescent single molecules. It provides an optical platform to develop new multimolecular and functional imaging capabilities. While several open-source software suites provide subdiffraction localization of fluorescent molecules, software suites for spectroscopic analysis of sSMLM data remain unavailable. RainbowSTORM is an open-source ImageJ/FIJI plug-in for end-to-end spectroscopic analysis and visualization for sSMLM images. RainbowSTORM allows users to calibrate, preview and quantitatively analyze emission spectra acquired using different reported sSMLM system designs and fluorescent labels. Availability and implementation RainbowSTORM is a java plug-in for ImageJ (https://imagej.net)/FIJI (http://fiji.sc) freely available through: https://github.com/FOIL-NU/RainbowSTORM. RainbowSTORM has been tested with Windows and Mac operating systems and ImageJ/FIJI version 1.52. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  7. Xu, Jinbo (Ed.)
    Abstract Motivation Developing methods to efficiently analyze 3D point cloud data of plant architectures remain challenging for many phenotyping applications. Here, we describe a tool that tackles four core phenotyping tasks: classification of cloud points into stem and lamina points, graph skeletonization of the stem points, segmentation of individual lamina and whole leaf labeling. These four tasks are critical for numerous downstream phenotyping goals, such as quantifying plant biomass, performing morphological analyses of plant shapes and uncovering genotype to phenotype relationships. The Plant 3D tool provides an intuitive graphical user interface, a fast 3D rendering engine for visualizing plants with millions of cloud points, and several graph-theoretic and machine-learning algorithms for 3D architecture analyses. Availability and implementation P3D is open-source and implemented in C++. Source code and Windows installer are freely available at https://github.com/iziamtso/P3D/. Contact iziamtso@ucsd.edu or navlakha@cshl.edu Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less