Single-cell RNA-sequencing (scRNA-seq) has been widely used for disease studies, where sample batches are collected from donors under different conditions including demographic groups, disease stages, and drug treatments. It is worth noting that the differences among sample batches in such a study are a mixture of technical confounders caused by batch effect and biological variations caused by condition effect. However, current batch effect removal methods often eliminate both technical batch effect and meaningful condition effect, while perturbation prediction methods solely focus on condition effect, resulting in inaccurate gene expression predictions due to unaccounted batch effect. Here we introduce scDisInFact, a deep learning framework that models both batch effect and condition effect in scRNA-seq data. scDisInFact learns latent factors that disentangle condition effect from batch effect, enabling it to simultaneously perform three tasks: batch effect removal, condition-associated key gene detection, and perturbation prediction. We evaluate scDisInFact on both simulated and real datasets, and compare its performance with baseline methods for each task. Our results demonstrate that scDisInFact outperforms existing methods that focus on individual tasks, providing a more comprehensive and accurate approach for integrating and predicting multi-batch multi-condition single-cell RNA-sequencing data.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Free, publicly-accessible full text available December 1, 2025 -
Free, publicly-accessible full text available September 14, 2025
-
ABSTRACT The receiver function (RF) is a widely used crustal imaging technique. In principle, it assumes relatively noise-free traces that can be used to target receiver-side structures following source deconvolution. In practice, however, mode conversions and reflections may be severely degraded by noisy conditions, hampering robust estimation of crustal parameters. In this study, we use a sparsity-promoting Radon transform to decompose the observed RF traces into their wavefield contributions, that is, direct conversions, multiples, and incoherent noise. By applying a crustal mask on the Radon-transformed RF, we obtain noise-free RF traces with only Moho conversions and reflections. We demonstrate, using a synthetic experiment and a real-data example from the Sierra Nevada, that our approach can effectively denoise the RFs and extract the underlying Moho signals. This greatly improves the robustness of crustal structure recovery as exemplified by subsequent H−κ stacking. We further demonstrate, using a station sitting on loose sediments in the Upper Mississippi embayment, that a combination of our approach and frequency-domain filtering can significantly improve crustal imaging in reverberant settings. In the presence of complex crustal structures, for example, dipping Moho, intracrustal layers, and crustal anisotropy, we recommend caution when applying our proposed approach due to the difficulty of interpreting a possibly more complicated Radon image. We expect that our technique will enable high-resolution crustal imaging and inspire more applications of Radon transforms in seismic signal processing.
Free, publicly-accessible full text available January 8, 2025 -
Abstract Single cell profiling techniques including multi-omics and spatial-omics technologies allow researchers to study cell-cell variation within a cell population. These variations extend to biological networks within cells, in particular, the gene regulatory networks (GRNs). GRNs rewire as the cells evolve, and different cells can have different governing GRNs. However, existing GRN inference methods usually infer a single GRN for a population of cells, without exploring the cell-cell variation in terms of their regulatory mechanisms. Recently, jointly profiled single cell transcriptomics and chromatin accessibility data have been used to infer GRNs. Although methods based on such multi-omics data were shown to improve over the accuracy of methods using only single cell RNA-seq (scRNA-seq) data, they do not take full advantage of the single cell resolution chromatin accessibility data.
We propose CeSpGRN (
Ce llSp ecificG eneR egulatoryN etwork inference), which infers cell-specific GRNs from scRNA-seq, single cell multi-omics, or single cell spatial-omics data. CeSpGRN uses a Gaussian weighted kernel that allows the GRN of a given cell to be learned from the sequencing profile of itself and its neighboring cells in the developmental process. The kernel is constructed from the similarity of gene expressions or spatial locations between cells. When the chromatin accessibility data is available, CeSpGRN constructs cell-specific prior networks which are used to further improve the inference accuracy.We applied CeSpGRN to various types of real-world datasets and inferred various regulation changes that were shown to be important in cell development. We also quantitatively measured the performance of CeSpGRN on simulated datasets and compared with baseline methods. The results show that CeSpGRN has a superior performance in reconstructing the GRN for each cell, as well as in detecting the regulatory interactions that differ between cells. CeSpGRN is available at
https://github.com/PeterZZQ/CeSpGRN . -
SUMMARY Seismic interrogation of the upper mantle from the base of the crust to the top of the mantle transition zone has revealed discontinuities that are variable in space, depth, lateral extent, amplitude and lack a unified explanation for their origin. Improved constraints on the detectability and properties of mantle discontinuities can be obtained with P-to-S receiver function (Ps-RF) where energy scatters from P to S as seismic waves propagate across discontinuities of interest. However, due to the interference of crustal multiples, uppermost mantle discontinuities are more commonly imaged with lower resolution S-to-P receiver function (Sp-RF). In this study, a new method called CRISP-RF (Clean Receiver-function Imaging using SParse Radon Filters) is proposed, which incorporates ideas from compressive sensing and model-based image reconstruction. The central idea involves applying a sparse Radon transform to effectively decompose the Ps-RF into its underlying wavefield contributions, that is direct conversions, multiples, and noise, based on the phase moveout and coherence. A masking filter is then designed and applied to create a multiple-free and denoised Ps-RF. We demonstrate, using synthetic experiment, that our implementation of the Radon transform using a sparsity-promoting regularization outperforms the conventional least-squares methods and can effectively isolate direct Ps conversions. We further apply the CRISP-RF workflow on real data, including single station data on cratons, common-conversion-point stack at continental margins and seismic data from ocean islands. The application of CRISP-RF to global data sets will advance our understanding of the enigmatic origins of the upper mantle discontinuities like the ubiquitous mid-lithospheric discontinuity and the elusive X-discontinuity.
-
Abstract The Earth, in large portions, is covered in oceans, sediments, and glaciers. High‐resolution body wave imaging in such environments often suffers from severe reverberations, that is, repeating echoes of the incoming scattered wavefield trapped in the reverberant layer, making interpretation of lithospheric layering difficult. In this study, we propose a systematic data‐driven approach, using autocorrelation and homomorphic analysis, to solve the twin problem of detection and elimination of reverberations without a priori knowledge of the elastic structure of the reverberant layers. We demonstrate, using synthetic experiments and data examples, that our approach can effectively identify the signature of reverberations even in cases where the recording seismic array is deployed in complex settings, for example, using data from (a) a land station sitting on Songliao basin, (b) an ocean bottom station in the fore‐arc setting of the Alaska amphibious community seismic experiment, and (c) a station deployed on ice‐sediment strata in the glaciers of Antarctica. The elimination of the reverberation is implemented by a frequency domain filter whose parameters are automatically tuned using seismic data alone. On glaciers where the reverberating sediment layer is sandwiched between the lithosphere and an overlying ice layer, homomorphic analysis is preferable in detecting the signature of reverberation. We expect that our technique will see wide application for high‐resolution body wave imaging across a wide variety of conditions.
-
Abstract It is a challenging task to integrate scRNA-seq and scATAC-seq data obtained from different batches. Existing methods tend to use a pre-defined gene activity matrix to convert the scATAC-seq data into scRNA-seq data. The pre-defined gene activity matrix is often of low quality and does not reflect the dataset-specific relationship between the two data modalities. We propose scDART, a deep learning framework that integrates scRNA-seq and scATAC-seq data and learns cross-modalities relationships simultaneously. Specifically, the design of scDART allows it to preserve cell trajectories in continuous cell populations and can be applied to trajectory inference on integrated data.
-
Abstract Single cell data integration methods aim to integrate cells across data batches and modalities, and data integration tasks can be categorized into horizontal, vertical, diagonal, and mosaic integration, where mosaic integration is the most general and challenging case with few methods developed. We propose scMoMaT, a method that is able to integrate single cell multi-omics data under the mosaic integration scenario using matrix tri-factorization. During integration, scMoMaT is also able to uncover the cluster specific bio-markers across modalities. These multi-modal bio-markers are used to interpret and annotate the clusters to cell types. Moreover, scMoMaT can integrate cell batches with unequal cell type compositions. Applying scMoMaT to multiple real and simulated datasets demonstrated these features of scMoMaT and showed that scMoMaT has superior performance compared to existing methods. Specifically, we show that integrated cell embedding combined with learned bio-markers lead to cell type annotations of higher quality or resolution compared to their original annotations.
-
Yann, Ponty (Ed.)Abstract Motivation The study of the evolutionary history of biological networks enables deep functional understanding of various bio-molecular processes. Network growth models, such as the Duplication–Mutation with Complementarity (DMC) model, provide a principled approach to characterizing the evolution of protein–protein interactions (PPIs) based on duplication and divergence. Current methods for model-based ancestral network reconstruction primarily use greedy heuristics and yield sub-optimal solutions. Results We present a new Integer Linear Programming (ILP) solution for maximum likelihood reconstruction of ancestral PPI networks using the DMC model. We prove the correctness of our solution that is designed to find the optimal solution. It can also use efficient heuristics from general-purpose ILP solvers to obtain multiple optimal and near-optimal solutions that may be useful in many applications. Experiments on synthetic data show that our ILP obtains solutions with higher likelihood than those from previous methods, and is robust to noise and model mismatch. We evaluate our algorithm on two real PPI networks, with proteins from the families of bZIP transcription factors and the Commander complex. On both the networks, solutions from our ILP have higher likelihood and are in better agreement with independent biological evidence from other studies. Availability and implementation A Python implementation is available at https://bitbucket.org/cdal/network-reconstruction. Supplementary information Supplementary data are available at Bioinformatics online.more » « less