NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ten simple rules for good model-sharing practices

https://doi.org/10.1371/journal.pcbi.1012702

Kherroubi_Garcia, Ismael; Erdmann, Christopher; Gesing, Sandra; Barton, Michael; Cadwallader, Lauren; Hengeveld, Geerten; Kirkpatrick, Christine R; Knight, Kathryn; Lemmen, Carsten; Ringuette, Rebecca; et al (January 2025, PLOS Computational Biology)
Schwartz, Russell (Ed.)
Computational models are complex scientific constructs that have become essential for us to better understand the world. Many models are valuable for peers within and beyond disciplinary boundaries. However, there are no widely agreed-upon standards for sharing models. This paper suggests 10 simple rules for you to both (i) ensure you share models in a way that is at least “good enough,” and (ii) enable others to lead the change towards better model-sharing practices.
more » « less
Free, publicly-accessible full text available January 10, 2026
grenedalf: population genetic statistics for the next generation of pool sequencing

https://doi.org/10.1093/bioinformatics/btae508

Czech, Lucas; Spence, Jeffrey P; Expósito-Alonso, Moisés (August 2024, Bioinformatics)
Schwartz, Russell (Ed.)
Abstract SummaryPool sequencing is an efficient method for capturing genome-wide allele frequencies from multiple individuals, with broad applications such as studying adaptation in Evolve-and-Resequence experiments, monitoring of genetic diversity in wild populations, and genotype-to-phenotype mapping. Here, we present grenedalf, a command line tool written in C++ that implements common population genetic statistics such as θ, Tajima’s D, and FST for Pool sequencing. It is orders of magnitude faster than current tools, and is focused on providing usability and scalability, while also offering a plethora of input file formats and convenience options. Availability and implementationgrenedalf is published under the GPL-3, and freely available at github.com/lczech/grenedalf.
more » « less
Full Text Available
Parsnp 2.0: scalable core-genome alignment for massive microbial datasets

https://doi.org/10.1093/bioinformatics/btae311

Kille, Bryce; Nute, Michael G; Huang, Victor; Kim, Eddie; Phillippy, Adam M; Treangen, Todd J (May 2024, Bioinformatics)
Schwartz, Russell (Ed.)
Abstract MotivationSince 2016, the number of microbial species with available reference genomes in NCBI has more than tripled. Multiple genome alignment, the process of identifying nucleotides across multiple genomes which share a common ancestor, is used as the input to numerous downstream comparative analysis methods. Parsnp is one of the few multiple genome alignment methods able to scale to the current era of genomic data; however, there has been no major release since its initial release in 2014. ResultsTo address this gap, we developed Parsnp v2, which significantly improves on its original release. Parsnp v2 provides users with more control over executions of the program, allowing Parsnp to be better tailored for different use-cases. We introduce a partitioning option to Parsnp, which allows the input to be broken up into multiple parallel alignment processes which are then combined into a final alignment. The partitioning option can reduce memory usage by over 4× and reduce runtime by over 2×, all while maintaining a precise core-genome alignment. The partitioning workflow is also less susceptible to complications caused by assembly artifacts and minor variation, as alignment anchors only need to be conserved within their partition and not across the entire input set. We highlight the performance on datasets involving thousands of bacterial and viral genomes. Availability and implementationParsnp v2 is available at https://github.com/marbl/parsnp.
more » « less
Full Text Available
Ten simple rules for creating and sustaining antiracist graduate programs

https://doi.org/10.1371/journal.pcbi.1010516

Perez-Lopez, Edgar; Gavrilova, Larisa; Disla, Janice; Goodlad, Melissa; Ngo, Dalena; Seshappan, Arabi; Sharmin, Farhana; Cisneros, Jesus; Kello, Christopher T.; Berhe, Asmeret Asefaw (October 2022, PLOS Computational Biology)
Schwartz, Russell (Ed.)
In 2020, the combination of police killings of unarmed Black people, including George Floyd, Breonna Taylor, and Ahmaud Arbery, and the Coronavirus Disease 2019 (COVID-19) pandemic brought about public outrage over long-standing inequalities in society. The events of 2020 ignited global attention to systemic racism and racial inequalities, including the lack of diversity, equity, and inclusion in the academy and especially in science, technology, engineering, mathematics, and medicine (STEMM) fields. Racial and ethnic diversity in graduate programs in particular warrants special attention as graduate students of color report experiencing alarming rates of racism, discrimination, microaggressions, and other exclusionary behaviors. As part of the Graduate Dean’s Advisory Council on Diversity (GDACD) at the University of California Merced, the authors of this manuscript held a year-long discussion on these issues and ways to take meaningful action to address these persistent issues of injustices. We have outlined 10 rules to help graduate programs develop antiracist practices to promote racial and ethnic justice, equity, diversity, and inclusion (JEDI) in the academy. We focus on efforts to address systemic causes of the underrepresentation and attrition of students from minoritized communities. The 10 rules are developed to allow graduate groups to formulate and implement rules and policies to address root causes of underrepresentation of minoritized students in graduate education.
more » « less
Full Text Available
Ten simple rules for designing and running a computing minor for bio/chem students

https://doi.org/10.1371/journal.pcbi.1010202

Reyes, Rochelle-Jan; Hosmane, Nina; Ihorn, Shasta; Johnson, Milo; Kulkarni, Anagha; Nelson, Jennifer; Savvides, Michael; Ta, Duc; Yoon, Ilmi; Pennings, Pleuni S. (July 2022, PLOS Computational Biology)
Schwartz, Russell (Ed.)
Science students increasingly need programming and data science skills to be competitive in the modern workforce. However, at our university (San Francisco State University), until recently, almost no biology, biochemistry, and chemistry students (from here bio/chem students) completed a minor in computer science. To change this, a new minor in computing applications, which is informally known as the Promoting Inclusivity in Computing (PINC) minor, was established in 2016. Here, we present the lessons we learned from our experience in a set of 10 rules. The first 3 rules focus on setting up the program so that it interests students in biology, chemistry, and biochemistry. Rules 4 through 8 focus on how the classes of the program are taught to make them interesting for our students and to provide the students with the support they need. The last 2 rules are about what happens “behind the scenes” of running a program with many people from several departments involved.
more » « less
Full Text Available
Ten simple rules for attending your first conference

https://doi.org/10.1371/journal.pcbi.1009133

Leininger, Elizabeth; Shaw, Kelly; Moshiri, Niema; Neiles, Kelly; Onsongo, Getiria; Ritz, Anna (July 2021, PLOS Computational Biology)
Schwartz, Russell (Ed.)
Full Text Available
Openness weighted association studies: leveraging personal genome information to prioritize non-coding variants

https://doi.org/10.1093/bioinformatics/btab514

Song, Shuang; Shan, Nayang; Wang, Geng; Yan, Xiting; Liu, Jun S; Hou, Lin (July 2021, Bioinformatics)
Schwartz, Russell (Ed.)
Abstract Motivation Identification and interpretation of non-coding variations that affect disease risk remain a paramount challenge in genome-wide association studies (GWAS) of complex diseases. Experimental efforts have provided comprehensive annotations of functional elements in the human genome. On the other hand, advances in computational biology, especially machine learning approaches, have facilitated accurate predictions of cell-type-specific functional annotations. Integrating functional annotations with GWAS signals has advanced the understanding of disease mechanisms. In previous studies, functional annotations were treated as static of a genomic region, ignoring potential functional differences imposed by different genotypes across individuals. Results We develop a computational approach, Openness Weighted Association Studies (OWAS), to leverage and aggregate predictions of chromosome accessibility in personal genomes for prioritizing GWAS signals. The approach relies on an analytical expression we derived for identifying disease associated genomic segments whose effects in the etiology of complex diseases are evaluated. In extensive simulations and real data analysis, OWAS identifies genes/segments that explain more heritability than existing methods, and has a better replication rate in independent cohorts than GWAS. Moreover, the identified genes/segments show tissue-specific patterns and are enriched in disease relevant pathways. We use rheumatic arthritis and asthma as examples to demonstrate how OWAS can be exploited to provide novel insights on complex diseases. Availability and implementation The R package OWAS that implements our method is available at https://github.com/shuangsong0110/OWAS. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
D-MANOVA: fast distance-based multivariate analysis of variance for large-scale microbiome association studies

https://doi.org/10.1093/bioinformatics/btab498

Chen, Jun; Zhang, Xianyang (July 2021, Bioinformatics)
Schwartz, Russell (Ed.)
Abstract Summary PERMANOVA (permutational multivariate analysis of variance based on distances) has been widely used for testing the association between the microbiome and a covariate of interest. Statistical significance is established by permutation, which is computationally intensive for large sample sizes. As large-scale microbiome studies, such as American Gut Project (AGP), become increasingly popular, a computationally efficient version of PERMANOVA is much needed. To achieve this end, we derive the asymptotic distribution of the PERMANOVA pseudo-F statistic and provide analytical P-value calculation based on chi-square approximation. We show that the asymptotic P-value is close to the PERMANOVA P-value even under a moderate sample size. Moreover, it is more accurate and an order-of-magnitude faster than the permutation-free method MDMR. We demonstrated the use of our procedure D-MANOVA on the AGP dataset. Availability and implementation D-MANOVA is implemented by the dmanova function in the CRAN package GUniFrac. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Ten simple rules to cultivate transdisciplinary collaboration in data science

https://doi.org/10.1371/journal.pcbi.1008879

Sahneh, Faryad; Balk, Meghan A.; Kisley, Marina; Chan, Chi-kwan; Fox, Mercury; Nord, Brian; Lyons, Eric; Swetnam, Tyson; Huppenkothen, Daniela; Sutherland, Will; et al (May 2021, PLOS Computational Biology)
Schwartz, Russell (Ed.)
Full Text Available
A two-step approach to testing overall effect of gene–environment interaction for multiple phenotypes

https://doi.org/10.1093/bioinformatics/btaa1083

Majumdar, Arunabha; Burch, Kathryn S; Haldar, Tanushree; Sankararaman, Sriram; Pasaniuc, Bogdan; Gauderman, W James; Witte, John S (December 2020, Bioinformatics)
Schwartz, Russell (Ed.)
Abstract Motivation While gene–environment (GxE) interactions contribute importantly to many different phenotypes, detecting such interactions requires well-powered studies and has proven difficult. To address this, we combine two approaches to improve GxE power: simultaneously evaluating multiple phenotypes and using a two-step analysis approach. Previous work shows that the power to identify a main genetic effect can be improved by simultaneously analyzing multiple related phenotypes. For a univariate phenotype, two-step methods produce higher power for detecting a GxE interaction compared to single step analysis. Therefore, we propose a two-step approach to test for an overall GxE effect for multiple phenotypes. Results Using simulations we demonstrate that, when more than one phenotype has GxE effect (i.e. GxE pleiotropy), our approach offers substantial gain in power (18–43%) to detect an aggregate-level GxE effect for a multivariate phenotype compared to an analogous two-step method to identify GxE effect for a univariate phenotype. We applied the proposed approach to simultaneously analyze three lipids, LDL, HDL and Triglyceride with the frequency of alcohol consumption as environmental factor in the UK Biobank. The method identified two loci with an overall GxE effect on the vector of lipids, one of which was missed by the competing approaches. Availability and implementation We provide an R package MPGE implementing the proposed approach which is available from CRAN: https://cran.r-project.org/web/packages/MPGE/index.html Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available

« Prev Next »

Search for: All records