Search for: All records

Award ID contains: 2013905

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

0

Conference Proceeding

0

Dataset

0

Journal Article

7

Workshop Report

0

Availability
Full Text / Resource Available

5

Citation Only

2

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Multisource Single-Cell Data Integration by MAW Barycenter for Gaussian Mixture Models

https://doi.org/10.1111/biom.13630

Lin, Lin ; Shi, Wei ; Ye, Jianbo ; Li, Jia ( February 2022 , Biometrics)

Abstract
One key challenge encountered in single-cell data clustering is to combine clustering results of data sets acquired from multiple sources. We propose to represent the clustering result of each data set by a Gaussian mixture model (GMM) and produce an integrated result based on the notion of Wasserstein barycenter. However, the precise barycenter of GMMs, a distribution on the same sample space, is computationally infeasible to solve. Importantly, the barycenter of GMMs may not be a GMM containing a reasonable number of components. We thus propose to use the minimized aggregated Wasserstein (MAW) distance to approximate the Wasserstein metric and develop a new algorithm for computing the barycenter of GMMs under MAW. Recent theoretical advances further justify using the MAW distance as an approximation for the Wasserstein metric between GMMs. We also prove that the MAW barycenter of GMMs has the same expectation as the Wasserstein barycenter. Our proposed algorithm for clustering integration scales well with the data dimension and the number of mixture components, with complexity independent of data size. We demonstrate that the new method achieves better clustering results on several single-cell RNA-seq data sets than some other popular methods.

more » « less
Statistical and machine learning methods for immunoprofiling based on single-cell data

https://doi.org/10.1080/21645515.2023.2234792

Zhang, Jingxuan ; Li, Jia ; Lin, Lin ( August 2023 , Human Vaccines & Immunotherapeutics)

Free, publicly-accessible full text available August 1, 2024
Robust deep neural network surrogate models with uncertainty quantification via adversarial training

https://doi.org/10.1002/sam.11610

Zhang, Lixiang ; Li, Jia ( June 2023 , Statistical Analysis and Data Mining: The ASA Data Science Journal)

Free, publicly-accessible full text available June 1, 2024
Multi-view clustering by CPS-merge analysis with application to multimodal single-cell data

https://doi.org/10.1371/journal.pcbi.1011044

Zhang, Lixiang ; Lin, Lin ; Li, Jia ( April 2023 , PLOS Computational Biology)
Alber, Mark (Ed.)
Multi-view data can be generated from diverse sources, by different technologies, and in multiple modalities. In various fields, integrating information from multi-view data has pushed the frontier of discovery. In this paper, we develop a new approach for multi-view clustering, which overcomes the limitations of existing methods such as the need of pooling data across views, restrictions on the clustering algorithms allowed within each view, and the disregard for complementary information between views. Our new method, called CPS-merge analysis , merges clusters formed by the Cartesian product of single-view cluster labels, guided by the principle of maximizing clustering stability as evaluated by CPS analysis. In addition, we introduce measures to quantify the contribution of each view to the formation of any cluster. CPS-merge analysis can be easily incorporated into an existing clustering pipeline because it only requires single-view cluster labels instead of the original data. We can thus readily apply advanced single-view clustering algorithms. Importantly, our approach accounts for both consensus and complementary effects between different views, whereas existing ensemble methods focus on finding a consensus for multiple clustering results, implying that results from different views are variations of one clustering structure. Through experiments on single-cell datasets, we demonstrate that our approach frequently outperforms other state-of-the-art methods.
more » « less
Full Text Available
Mixture of Linear Models Co-supervised by Deep Neural Networks

https://doi.org/10.1080/10618600.2022.2107533

Seo, Beomseok ; Lin, Lin ; Li, Jia ( October 2022 , Journal of Computational and Graphical Statistics)

Full Text Available
VtNet: A neural network with variable importance assessment

https://doi.org/10.1002/sta4.325

Zhang, Lixiang ; Lin, Lin ; Li, Jia ( December 2021 , Stat)

The architectures of many neural networks rely heavily on the underlying grid associated with the variables, for instance, the lattice of pixels in an image. For general biomedical data without a grid structure, the multi‐layer perceptron (MLP) and deep belief network (DBN) are often used. However, in these networks, variables are treated homogeneously in the sense of network structure; and it is difficult to assess their individual importance. In this paper, we propose a novel neural network called Variable‐block tree Net (VtNet) whose architecture is determined by an underlying tree with each node corresponding to a subset of variables. The tree is learned from the data to best capture the causal relationships among the variables. VtNet contains a long short‐term memory (LSTM)‐like cell for every tree node. The input and forget gates of each cell control the information flow through the node, and they are used to define a significance score for the variables. To validate the defined significance score, VtNet is trained using smaller trees with variables of low scores removed. Hypothesis tests are conducted to show that variables of higher scores influence classification more strongly. Comparison is made with the variable importance score defined in Random Forest from the aspect of variable selection. Our experiments demonstrate that VtNet is highly competitive in classification accuracy and can often improve accuracy by removing variables with low significance scores.
more » « less
Full Text Available
Optimal Transport With Relaxed Marginal Constraints

https://doi.org/10.1109/ACCESS.2021.3072613

Li, Jia ; Lin, Lin ( January 2021 , IEEE Access)