Abstract BackgroundComputational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions. ResultsIn our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers. ConclusionsOur heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly packagehttps://github.com/humengying0907/deconvBenchmarkingandhttps://doi.org/10.5281/zenodo.8206516, enabling further developments in deconvolution methods.
more »
« less
PyCDFT : A Python package for constrained density functional theory
Abstract We present PyCDFT, a Python package to compute diabatic states using constrained density functional theory (CDFT). PyCDFT provides an object‐oriented, customizable implementation of CDFT, and allows for both single‐point self‐consistent‐field calculations and geometry optimizations. PyCDFT is designed to interface with existing density functional theory (DFT) codes to perform CDFT calculations where constraint potentials are added to the Kohn–Sham Hamiltonian. Here, we demonstrate the use of PyCDFT by performing calculations with a massively parallel first‐principles molecular dynamics code, Qbox, and we benchmark its accuracy by computing the electronic coupling between diabatic states for a set of organic molecules. We show that PyCDFT yields results in agreement with existing implementations and is a robust and flexible package for performing CDFT calculations. The program is available athttps://dx.doi.org/10.5281/zenodo.3821097.
more »
« less
- Award ID(s):
- 1764399
- PAR ID:
- 10458146
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Journal of Computational Chemistry
- Volume:
- 41
- Issue:
- 20
- ISSN:
- 0192-8651
- Page Range / eLocation ID:
- p. 1859-1867
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
BackgroundThe advancement of sequencing technology has led to a rapid increase in the amount of DNA and protein sequence data; consequently, the size of genomic and proteomic databases is constantly growing. As a result, database searches need to be continually updated to account for the new data being added. However, continually re-searching the entire existing dataset wastes resources. Incremental database search can address this problem. MethodsOne recently introduced incremental search method is iBlast, which wraps the BLAST sequence search method with an algorithm to reuse previously processed data and thereby increase search efficiency. The iBlast wrapper, however, must be generalized to support better performing DNA/protein sequence search methods that have been developed, namely MMseqs2 and Diamond. To address this need, we propose iSeqsSearch, which extends iBlast by incorporating support for MMseqs2 (iMMseqs2) and Diamond (iDiamond), thereby providing a more generalized and broadly effective incremental search framework. Moreover, the previously published iBlast wrapper has to be revised to be more robust and usable by the general community. ResultsiMMseqs2 and iDiamond, which apply the incremental approach, perform nearly identical to MMseqs2 and Diamond. Notably, when comparing ranking comparison methods such as the Pearson correlation, we observe a high concordance of over 0.9, indicating similar results. Moreover, in some cases, our incremental approach, iSeqsSearch, which extends the iBlast merge function to iMMseqs2 and iDiamond, provides more hits compared to the conventional MMseqs2 and Diamond methods. ConclusionThe incremental approach using iMMseqs2 and iDiamond demonstrates efficiency in terms of reusing previously processed data while maintaining high accuracy and concordance in search results. This method can reduce resource waste in continually growing genomic and proteomic database searches. The sample codes and data are available at GitHub and Zenodo (https://github.com/EESI/Incremental-Protein-Search; DOI:10.5281/zenodo.14675319).more » « less
-
Abstract While space-borne optical and near-infrared facilities have succeeded in delivering a precise and spatially resolved picture of our Universe, their small survey area is known to underrepresent the true diversity of galaxy populations. Ground-based surveys have reached comparable depths but at lower spatial resolution, resulting in source confusion that hampers accurate photometry extractions. What once was limited to the infrared regime has now begun to challenge ground-based ultradeep surveys, affecting detection and photometry alike. Failing to address these challenges will mean forfeiting a representative view into the distant Universe. We introduceThe Farmer: an automated, reproducible profile-fitting photometry package that pairs a library of smooth parametric models fromThe Tractorwith a decision tree that determines the best-fit model in concert with neighboring sources. Photometry is measured by fitting the models on other bands leaving brightness free to vary. The resulting photometric measurements are naturally total, and no aperture corrections are required. Supporting diagnostics (e.g.,χ2) enable measurement validation. As fitting models is relatively time intensive,The Farmeris built with high-performance computing routines. We benchmarkThe Farmeron a set of realistic COSMOS-like images and find accurate photometry, number counts, and galaxy shapes.The Farmeris already being utilized to produce catalogs for several large-area deep extragalactic surveys where it has been shown to tackle some of the most challenging optical and near-infrared data available, with the promise of extending to other ultradeep surveys expected in the near future.The Farmeris available to download from GitHub (https://github.com/astroweaver/the_farmer) and Zenodo (https://doi.org/10.5281/zenodo.8205817).more » « less
-
Abstract Type Ia supernova explosions (SN Ia) are fundamental sources of elements for the chemical evolution of galaxies. They efficiently produce intermediate-mass (withZbetween 11 and 20) and iron group elements—for example, about 70% of the solar iron is expected to be made by SN Ia. In this work, we calculate complete abundance yields for 39 models of SN Ia explosions, based on three progenitors—a 1.4M⊙deflagration detonation model, a 1.0M⊙double detonation model, and a 0.8M⊙double detonation model—and 13 metallicities, with22Ne mass fractions of 0, 1 × 10−7, 1 × 10−6, 1 × 10−5, 1 × 10−4, 1 × 10−3, 2 × 10−3, 5 × 10−3, 1 × 10−2, 1.4 × 10−2, 5 × 10−2, and 0.1, respectively. Nucleosynthesis calculations are done using the NuGrid suite of codes, using a consistent nuclear reaction network between the models. Complete tables with yields and production factors are provided online at Zenodo:Yields (https://doi.org/10.5281/zenodo.8060323). We discuss the main properties of our yields in light of the present understanding of SN Ia nucleosynthesis, depending on different progenitor mass and composition. Finally, we compare our results with a number of relevant models from the literature.more » « less
-
Abstract We present meteorology and snow observation data collected at sites in the southwestern Colorado Rocky Mountains (USA) over three consecutive water years with different amounts of snow water equivalent (SWE) accumulation: A year with above average SWE (2019), a year with average SWE (2020), and a year with below average SWE (2021). This data set is distinguished by its emphasis on paired open‐forest sites in a continental snow climate. Approximately once a month during February–May, we collected data from 15 to 20 snow pits and took 8 to 19 snow depth transects. Our sampling sites were in open and adjacent forested areas at 3,100 m and in a lower elevation aspen (3,035 m) and higher elevation conifer stand (3,395 m). In total, we recorded 270 individual snow pit density and temperature profiles and over 4,000 snow depth measurements. These data are complimented by continuous meteorological measurements from two weather stations: One in the open and one in the adjacent forest. Meteorology data—including incoming shortwave and longwave radiation, outgoing shortwave radiation, relative humidity, wind speed, snow depth, and air and infrared surface temperature—were quality controlled and the forcing data were gap‐filled. These data are available to download from Bonner, Smyth, et al. (2022) athttps://doi.org/10.5281/zenodo.6618553, at three levels of processing, including a level with downscaled, adjusted precipitation based on data assimilation using observed snow depth and a process‐based snow model. We demonstrate the utility of these data with a modeling experiment that explores open‐forest differences and identifies opportunities for improvements in model representation.more » « less
An official website of the United States government
