NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ambiguities in neural-network-based hyperedge prediction

https://doi.org/10.1007/s41468-024-00172-x

Wan, Changlin; Zhang, Muhan; Dang, Pengtao; Hao, Wei; Cao, Sha; Li, Pan; Zhang, Chi (October 2024, Journal of Applied and Computational Topology)

Full Text Available
Generalized Matrix Local Low Rank Representation by Random Projection and Submatrix Propagation

https://doi.org/10.1145/3580305.3599361

Dang, Pengtao; Zhu, Haiqi; Guo, Tingbo; Wan, Changlin; Zhao, Tong; Salama, Paul; Wang, Yijie; Cao, Sha; Zhang, Chi (August 2023, ACM)

Matrix low rank approximation is an effective method to reduce or eliminate the statistical redundancy of its components. Compared with the traditional global low rank methods such as singular value decomposition (SVD), local low rank approximation methods are more advantageous to uncover interpretable data structures when clear duality exists between the rows and columns of the matrix. Local low rank approximation is equivalent to low rank submatrix detection. Unfortunately,existing local low rank approximation methods can detect only submatrices of specific mean structure, which may miss a substantial amount of true and interesting patterns. In this work, we develop a novel matrix computational framework called RPSP (Random Probing based submatrix Propagation) that provides an effective solution for the general matrix local low rank representation problem. RPSP detects local low rank patterns that grow from small submatrices of low rank property, which are determined by a random projection approach. RPSP is supported by theories of random projection. Experiments on synthetic data demonstrate that RPSP outperforms all state-of-the-art methods, with the capacity to robustly and correctly identify the low rank matrices when the pattern has a similar mean as the background, background noise is heteroscedastic and multiple patterns present in the data. On real-world datasets, RPSP also demonstrates its effectiveness in identifying interpretable local low rank matrices.
more » « less
Full Text Available
Bias Aware Probabilistic Boolean Matrix Factorization

Wan, Changlin; Dang, Pengtao; Zhao, Tong; Zang, Yong; Zhang, Chi; Cao, Sha (July 2022, Uncertainty in artificial intelligence)

Full Text Available
Pipeline for characterizing alternative mechanisms (PCAM) based on bi-clustering to study colorectal cancer heterogeneity

https://doi.org/10.1016/j.csbj.2023.03.028

Cao, Sha; Chang, Wennan; Wan, Changlin; Lu, Xiaoyu; Dang, Pengtao; Zhou, Xinyu; Zhu, Haiqi; Chen, Jian; Li, Bo; Zang, Yong; et al (January 2023, Computational and Structural Biotechnology Journal)

Full Text Available
PLUS: Predicting cancer metastasis potential based on positive and unlabeled learning

https://doi.org/10.1371/journal.pcbi.1009956

Zhou, Junyi; Lu, Xiaoyu; Chang, Wennan; Wan, Changlin; Lu, Xiongbin; Zhang, Chi; Cao, Sha (March 2022, PLOS Computational Biology)
Liu, Jie (Ed.)
Metastatic cancer accounts for over 90% of all cancer deaths, and evaluations of metastasis potential are vital for minimizing the metastasis-associated mortality and achieving optimal clinical decision-making. Computational assessment of metastasis potential based on large-scale transcriptomic cancer data is challenging because metastasis events are not always clinically detectable. The under-diagnosis of metastasis events results in biased classification labels, and classification tools using biased labels may lead to inaccurate estimations of metastasis potential. This issue is further complicated by the unknown metastasis prevalence at the population level, the small number of confirmed metastasis cases, and the high dimensionality of the candidate molecular features. Our proposed algorithm, called P ositive and unlabeled L earning from U nbalanced cases and S parse structures ( PLUS ), is the first to use a positive and unlabeled learning framework to account for the under-detection of metastasis events in building a classifier. PLUS is specifically tailored for studying metastasis that deals with the unbalanced instance allocation as well as unknown metastasis prevalence, which are not considered by other methods. PLUS achieves superior performance on synthetic datasets compared with other state-of-the-art methods. Application of PLUS to The Cancer Genome Atlas Pan-Cancer gene expression data generated metastasis potential predictions that show good agreement with the clinical follow-up data, in addition to predictive genes that have been validated by independent single-cell RNA-sequencing datasets.
more » « less
Full Text Available
Supervised clustering of high-dimensional data using regularized mixture modeling

https://doi.org/10.1093/bib/bbaa291

Chang, Wennan; Wan, Changlin; Zang, Yong; Zhang, Chi; Cao, Sha (July 2021, Briefings in Bioinformatics)

Abstract Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine.
more » « less
Full Text Available
Spatially and Robustly Hybrid Mixture Regression Model for Inference of Spatial Dependence

https://doi.org/10.1109/ICDM51629.2021.00013

Chang, Wennan; Dang, Pengdao; Wan, Changlin; Lu, Xiaoyu; Fang, Yue; Zhao, Tong; Zang, Yong; Li, Bo; Zhang, Chi; Cao, Sha (December 2021, 2021 IEEE International Conference on Data Mining (ICDM))

In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial non-stationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for non-stationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods.
more » « less
Full Text Available
Denoising Individual Bias for Fairer Binary Submatrix Detection

https://doi.org/10.1145/3340531.3412156

Wan, Changlin; Chang, Wennan; Zhao, Tong; Cao, Sha; Zhang, Chi (October 2020, Proceedings of the 29th ACM International Conference on Information & Knowledge Management)

Full Text Available
Geometric All-Way Boolean Tensor Decomposition

Wan, Changlin; Chang, Wennan; Zhao, Tong; Cao, Sha; Zhang, Chi (October 2020, Advances in neural information processing systems)

Full Text Available
A data denoising approach to optimize functional clustering of single cell RNA-sequencing data

https://doi.org/10.1109/BIBM49941.2020.9313483

Wan, Changlin; Jia, Dongya; Zhao, Yue; Chang, Wennan; Cao, Sha; Wang, Xiao; Zhang, Chi (December 2020, Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020)
null (Ed.)
Single cell RNA-sequencing (scRNA-seq) technology enables comprehensive transcriptomic profiling of thousands of cells with distinct phenotypic and physiological states in a complex tissue. Substantial efforts have been made to characterize single cells of distinct identities from scRNA-seq data, including various cell clustering techniques. While existing approaches can handle single cells in terms of different cell (sub)types at a high resolution, identification of the functional variability within the same cell type remains unsolved. In addition, there is a lack of robust method to handle the inter-subject variation that often brings severe confounding effects for the functional clustering of single cells. In this study, we developed a novel data denoising and cell clustering approach, namely CIBS, to provide biologically explainable functional classification for scRNA-seq data. CIBS is based on a systems biology model of transcriptional regulation that assumes a multi-modality distribution of the cells’ activation status, and it utilizes a Boolean matrix factorization approach on the discretized expression status to robustly derive functional modules. CIBS is empowered by a novel fast Boolean Matrix Factorization method, namely PFAST, to increase the computational feasibility on large scale scRNA-seq data. Application of CIBS on two scRNA-seq datasets collected from cancer tumor micro-environment successfully identified subgroups of cancer cells with distinct expression patterns of epithelial-mesenchymal transition and extracellular matrix marker genes, which was not revealed by the existing cell clustering analysis tools. The identified cell groups were significantly associated with the clinically confirmed lymph-node invasion and metastasis events across different patients. Index Terms—Cell clustering analysis, Data denoising, Boolean matrix factorization, Cancer microenvirionment, Metastasis.
more » « less
Full Text Available

« Prev Next »

Search for: All records