skip to main content


This content will become publicly available on May 1, 2025

Title: MUSTANG: Multi-sample spatial transcriptomics data analysis with cross-sample transcriptional similarity guidance
Award ID(s):
2212419
PAR ID:
10529194
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Cell
Date Published:
Journal Name:
Patterns
Volume:
5
Issue:
5
ISSN:
2666-3899
Page Range / eLocation ID:
100986
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The assumption that training and testing samples are generated from the same distribution does not always hold for real-world machine-learning applications. The procedure of tackling this discrepancy between the training (source) and testing (target) domains is known as domain adaptation. We propose an unsupervised version of domain adaptation that considers the presence of only unlabelled data in the target domain. Our approach centres on finding correspondences between samples of each domain. The correspondences are obtained by treating the source and target samples as graphs and using a convex criterion to match them. The criteria used are first-order and second-order similarities between the graphs as well as a class-based regularization. We have also developed a computationally efficient routine for the convex optimization, thus allowing the proposed method to be used widely. To verify the effectiveness of the proposed method, computer simulations were conducted on synthetic, image classification and sentiment classification datasets. Results validated that the proposed local sample-to- sample matching method out-performs traditional moment-matching methods and is competitive with respect to current local domain-adaptation methods. 
    more » « less
  2. null (Ed.)
    It is important to collect credible training samples $(x,y)$ for building data-intensive learning systems (e.g., a deep learning system). Asking people to report complex distribution $p(x)$, though theoretically viable, is challenging in practice. This is primarily due to the cognitive loads required for human agents to form the report of this highly complicated information. While classical elicitation mechanisms apply to eliciting a complex and generative (and continuous) distribution $p(x)$, we are interested in eliciting samples $x_i \sim p(x)$ from agents directly. We coin the above problem sample elicitation. This paper introduces a deep learning aided method to incentivize credible sample contributions from self-interested and rational agents. We show that with an accurate estimation of a certain $f$-divergence function we can achieve approximate incentive compatibility in eliciting truthful samples. We then present an efficient estimator with theoretical guarantees via studying the variational forms of the $f$-divergence function. We also show a connection between this sample elicitation problem and $f$-GAN, and how this connection can help reconstruct an estimator of the distribution based on collected samples. Experiments on synthetic data, MNIST, and CIFAR-10 datasets demonstrate that our mechanism elicits truthful samples. Our implementation is available at https://github.com/weijiaheng/Credible-sample-elicitation.git. 
    more » « less
  3. We present a novel inference approach that we call sample out-of-sample inference. The approach can be used widely, ranging from semisupervised learning to stress testing, and it is fundamental in the application of data-driven distributionally robust optimization. Our method enables measuring the impact of plausible out-of-sample scenarios in a given performance measure of interest, such as a financial loss. The methodology is inspired by empirical likelihood (EL), but we optimize the empirical Wasserstein distance (instead of the empirical likelihood) induced by observations. From a methodological standpoint, our analysis of the asymptotic behavior of the induced Wasserstein-distance profile function shows dramatic qualitative differences relative to EL. For instance, in contrast to EL, which typically yields chi-squared weak convergence limits, our asymptotic distributions are often not chi-squared. Also, the rates of convergence that we obtain have some dependence on the dimension in a nontrivial way but remain controlled as the dimension increases. 
    more » « less
  4. Kelvin probe force microscopy (KPFM) experiments are used to image capacitance and surface potential in a wide variety of samples. The widely used KPFM frequency-shift equation rests on assumptions that are questionable in samples having an appreciable impedance or whose properties evolve on a fast timescale. We present new equations describing the cantilever frequency and dissipation in a KPFM experiment carried out on a sample with an appreciable stationary or time-dependent impedance, such as a photovoltaic film, a battery material, or a mixed electronic-ionic conductors. 
    more » « less