Abstract When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test. 
                        more » 
                        « less   
                    This content will become publicly available on December 1, 2025
                            
                            Response to "Neglecting normalization impact in semi-synthetic RNA-seq data simulation generates artificial false positives" and "Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples"
                        
                    
    
            Abstract Two correspondences raised concerns or comments about our analyses regarding exaggerated false positives found by differential expression (DE) methods. Here, we discuss the points they raise and explain why we agree or disagree with these points. We add new analysis to confirm that the Wilcoxon rank-sum test remains the most robust method compared to the other five DE methods (DESeq2, edgeR, limma-voom, dearseq, and NOISeq) in two-condition DE analyses after considering normalization and winsorization, the data preprocessing steps discussed in the two correspondences. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10601276
- Publisher / Repository:
- Springer Nature
- Date Published:
- Journal Name:
- Genome Biology
- Volume:
- 25
- Issue:
- 1
- ISSN:
- 1474-760X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Avidan, S. (Ed.)We address the problem of segmenting moving rigid objects based on two-view image correspondences under a perspective camera model. While this is a well understood problem, existing methods scale poorly with the number of correspondences. In this paper we propose a fast segmentation algorithm that scales linearly with the number of correspondences and show that on benchmark datasets it offers the best trade-off between error and computational time: it is at least one order of magnitude faster than the best method (with comparable or better accuracy), with the ratio growing up to three orders of magnitude for larger number of correspondences. We approach the problem from an algebraic perspective by exploiting the fact that all points belonging to a given object lie in the same quadratic surface. The proposed method is based on a characterization of each surface in terms of the Christoffel polynomial associated with the probability that a given point belongs to the surface. This allows for efficiently segmenting points “one surface at a time” in O(number of points)more » « less
- 
            Abstract—We present a method for solving two minimal problems for relative camera pose estimation from three views, which are based on three view correspondences of (i) three points and one line and the novel case of (ii) three points and two lines through two of the points. These problems are too difficult to be efficiently solved by the state of the art Gro ̈bner basis methods. Our method is based on a new efficient homotopy continuation (HC) solver framework MINUS, which dramatically speeds up previous HC solving by specializing HC methods to generic cases of our problems. We characterize their number of solutions and show with simulated experiments that our solvers are numerically robust and stable under image noise, a key contribution given the borderline intractable degree of nonlinearity of trinocular constraints. We show in real experiments that (i) SIFT feature location and orientation provide good enough point-and-line correspondences for three-view reconstruction and (ii) that we can solve difficult cases with too few or too noisy tentative matches, where the state of the art structure from motion initialization fails.more » « less
- 
            Finding correspondences between images is a fundamental problem in computer vision. In this paper, we show that correspondence emerges in image diffusion models without any explicit supervision. We propose a simple strategy to extract this implicit knowledge out of diffusion networks as image features, namely DIffusion FeaTures (DIFT), and use them to establish correspondences between real images. Without any additional fine-tuning or supervision on the task-specific data or annotations, DIFT is able to outperform both weakly-supervised methods and competitive off-the-shelf features in identifying semantic, geometric, and temporal correspondences. Particularly for semantic correspondence, DIFT from Stable Diffusion is able to outperform DINO and OpenCLIP by 19 and 14 accuracy points respectively on the challenging SPair-71k benchmark. It even outperforms the state-of-the-art supervised methods on 9 out of 18 categories while remaining on par for the overall performance. Project page: https://diffusionfeatures. github.io.more » « less
- 
            Abstract Deletions are prevalent in the genomes of SARS-CoV-2 isolates from COVID-19 patients, but their roles in the severity, transmission, and persistence of disease are poorly understood. Millions of COVID-19 swab samples from patients have been sequenced and made available online, offering an unprecedented opportunity to study such deletions. Multiplex PCR-based amplicon sequencing (amplicon-seq) has been the most widely used method for sequencing clinical COVID-19 samples. However, existing bioinformatics methods applied to negative control samples sequenced by multiplex-PCR sequencing often yield large numbers of false-positive deletions. We found that these false positives commonly occur in short alignments, at low frequency and depth, and near primer-binding sites used for whole-genome amplification. To address this issue, we developed a filtering strategy, validated with positive control samples containing a known deletion. Our strategy accurately detected the known deletion and removed more than 99% of false positives. This method, applied to public COVID-19 swab data, revealed that deletions occurring independently of transcription regulatory sequences were about 20-fold less common than previously reported; however, they remain more frequent in symptomatic patients. Our optimized approach should enhance the reliability of SARS-CoV-2 deletion characterization from surveillance studies. Finally, our approach may guide the development of more reliable bioinformatics pipelines for genome sequence analyses of other viruses.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
