skip to main content


Search for: All records

Creators/Authors contains: "Zang, Yong"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Boolean matrix factorization (BMF) has been widely utilized in fields such as recommendation systems, graph learning, text mining, and -omics data analysis. Traditional BMF methods decompose a binary matrix into the Boolean product of two lower-rank Boolean matrices plus homoscedastic random errors. However, real-world binary data typically involves biases arising from heterogeneous row- and column-wise signal distributions. Such biases can lead to suboptimal fitting and unexplainable predictions if not accounted for. In this study, we reconceptualize the binary data generation as the Boolean sum of three components: a binary pattern matrix, a background bias matrix influenced by heterogeneous row or column distributions, and random flipping errors. We introduce a novel Disentangled Representation Learning for Binary matrices (DRLB) method, which employs a dual auto-encoder network to reveal the true patterns. DRLB can be seamlessly integrated with existing BMF techniques to facilitate bias-aware BMF. Our experiments with both synthetic and real-world datasets show that DRLB significantly enhances the precision of traditional BMF methods while offering high scalability. Moreover, the bias matrix detected by DRLB accurately reflects the inherent biases in synthetic data, and the patterns identified in the bias-corrected real-world data exhibit enhanced interpretability. 
    more » « less
    Free, publicly-accessible full text available April 26, 2025
  2. Abstract

    Quantitative assessment of single cell fluxome is critical for understanding the metabolic heterogeneity in diseases. Unfortunately, laboratory-based single cell fluxomics is currently impractical, and the current computational tools for flux estimation are not designed for single cell-level prediction. Given the well-established link between transcriptomic and metabolomic profiles, leveraging single cell transcriptomics data to predict single cell fluxome is not only feasible but also an urgent task. In this study, we present FLUXestimator, an online platform for predicting metabolic fluxome and variations using single cell or general transcriptomics data of large sample-size. The FLUXestimator webserver implements a recently developed unsupervised approach called single cell flux estimation analysis (scFEA), which uses a new neural network architecture to estimate reaction rates from transcriptomics data. To the best of our knowledge, FLUXestimator is the first web-based tool dedicated to predicting cell-/sample-wise metabolic flux and metabolite variations using transcriptomics data of human, mouse and 15 other common experimental organisms. The FLUXestimator webserver is available at http://scFLUX.org/, and stand-alone tools for local use are available at https://github.com/changwn/scFEA. Our tool provides a new avenue for studying metabolic heterogeneity in diseases and has the potential to facilitate the development of new therapeutic strategies.

     
    more » « less
  3. An immunotherapy trial often uses the phase I/II design to identify the optimal biological dose, which monitors the efficacy and toxicity outcomes simultaneously in a single trial. The progression-free survival rate is often used as the efficacy outcome in phase I/II immunotherapy trials. As a result, patients developing disease progression in phase I/II immunotherapy trials are generally seriously ill and are often treated off the trial for ethical consideration. Consequently, the happening of disease progression will terminate the toxicity event but not vice versa, so the issue of the semi-competing risks arises. Moreover, this issue can become more intractable with the late-onset outcomes, which happens when a relatively long follow-up time is required to ascertain progression-free survival. This paper proposes a novel Bayesian adaptive phase I/II design accounting for semi-competing risks outcomes for immunotherapy trials, referred to as the dose-finding design accounting for semi-competing risks outcomes for immunotherapy trials (SCI) design. To tackle the issue of the semi-competing risks in the presence of late-onset outcomes, we re-construct the likelihood function based on each patient's actual follow-up time and develop a data augmentation method to efficiently draw posterior samples from a series of Beta-binomial distributions. We propose a concise curve-free dose-finding algorithm to adaptively identify the optimal biological dose using accumulated data without making any parametric dose–response assumptions. Numerical studies show that the proposed SCI design yields good operating characteristics in dose selection, patient allocation, and trial duration. 
    more » « less
  4. Abstract Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine. 
    more » « less
  5. In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial non-stationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for non-stationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods. 
    more » « less
  6. The metabolic heterogeneity and metabolic interplay between cells are known as significant contributors to disease treatment resistance. However, with the lack of a mature high-throughput single-cell metabolomics technology, we are yet to establish systematic understanding of the intra-tissue metabolic heterogeneity and cooperative mechanisms. To mitigate this knowledge gap, we developed a novel computational method, namely, single-cell flux estimation analysis (scFEA), to infer the cell-wise fluxome from single-cell RNA-sequencing (scRNA-seq) data. scFEA is empowered by a systematically reconstructed human metabolic map as a factor graph, a novel probabilistic model to leverage the flux balance constraints on scRNA-seq data, and a novel graph neural network–based optimization solver. The intricate information cascade from transcriptome to metabolome was captured using multilayer neural networks to capitulate the nonlinear dependency between enzymatic gene expressions and reaction rates. We experimentally validated scFEA by generating an scRNA-seq data set with matched metabolomics data on cells of perturbed oxygen and genetic conditions. Application of scFEA on this data set showed the consistency between predicted flux and the observed variation of metabolite abundance in the matched metabolomics data. We also applied scFEA on five publicly available scRNA-seq and spatial transcriptomics data sets and identified context- and cell group–specific metabolic variations. The cell-wise fluxome predicted by scFEA empowers a series of downstream analyses including identification of metabolic modules or cell groups that share common metabolic variations, sensitivity evaluation of enzymes with regards to their impact on the whole metabolic flux, and inference of cell–tissue and cell–cell metabolic communications. 
    more » « less
  7. Abstract Deconvolution of mouse transcriptomic data is challenged by the fact that mouse models carry various genetic and physiological perturbations, making it questionable to assume fixed cell types and cell type marker genes for different data set scenarios. We developed a Semi-Supervised Mouse data Deconvolution (SSMD) method to study the mouse tissue microenvironment. SSMD is featured by (i) a novel nonparametric method to discover data set-specific cell type signature genes; (ii) a community detection approach for fixing cell types and their marker genes; (iii) a constrained matrix decomposition method to solve cell type relative proportions that is robust to diverse experimental platforms. In summary, SSMD addressed several key challenges in the deconvolution of mouse tissue data, including: (i) varied cell types and marker genes caused by highly divergent genotypic and phenotypic conditions of mouse experiment; (ii) diverse experimental platforms of mouse transcriptomics data; (iii) small sample size and limited training data source and (iv) capable to estimate the proportion of 35 cell types in blood, inflammatory, central nervous or hematopoietic systems. In silico and experimental validation of SSMD demonstrated its high sensitivity and accuracy in identifying (sub) cell types and predicting cell proportions comparing with state-of-the-arts methods. A user-friendly R package and a web server of SSMD are released via https://github.com/xiaoyulu95/SSMD. 
    more » « less