MUSTANG: Multi-sample spatial transcriptomics data analysis with cross-sample transcriptional similarity guidance

Niyakan, Seyednami; Sheng, Jianting; Cao, Yuliang; Zhang, Xiang; Xu, Zhan; Wu, Ling; Wong, Stephen TC; Qian, Xiaoning

doi:10.1016/j.patter.2024.100986

The assumption that training and testing samples are generated from the same distribution does not always hold for real-world machine-learning applications. The procedure of tackling this discrepancy between the training (source) and testing (target) domains is known as domain adaptation. We propose an unsupervised version of domain adaptation that considers the presence of only unlabelled data in the target domain. Our approach centres on finding correspondences between samples of each domain. The correspondences are obtained by treating the source and target samples as graphs and using a convex criterion to match them. The criteria used are first-order and second-order similarities between the graphs as well as a class-based regularization. We have also developed a computationally efficient routine for the convex optimization, thus allowing the proposed method to be used widely. To verify the effectiveness of the proposed method, computer simulations were conducted on synthetic, image classification and sentiment classification datasets. Results validated that the proposed local sample-to- sample matching method out-performs traditional moment-matching methods and is competitive with respect to current local domain-adaptation methods.

More Like this