Search for: All records

Award ID contains: 1949268

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Predicting evolutionary targets and parameters of gene deletion from expression data

https://doi.org/10.1093/bioadv/vbae002

Campelo dos Santos, Andre Luiz; DeGiorgio, Michael; Assis, Raquel; Forslund, ed., Sofia (January 2024, Bioinformatics Advances)

Abstract MotivationGene deletion is traditionally thought of as a nonadaptive process that removes functional redundancy from genomes, such that it generally receives less attention than duplication in evolutionary turnover studies. Yet, mounting evidence suggests that deletion may promote adaptation via the “less-is-more” evolutionary hypothesis, as it often targets genes harboring unique sequences, expression profiles, and molecular functions. Hence, predicting the relative prevalence of redundant and unique functions among genes targeted by deletion, as well as the parameters underlying their evolution, can shed light on the role of gene deletion in adaptation. ResultsHere, we present CLOUDe, a suite of machine learning methods for predicting evolutionary targets of gene deletion events from expression data. Specifically, CLOUDe models expression evolution as an Ornstein–Uhlenbeck process, and uses multi-layer neural network, extreme gradient boosting, random forest, and support vector machine architectures to predict whether deleted genes are “redundant” or “unique”, as well as several parameters underlying their evolution. We show that CLOUDe boasts high power and accuracy in differentiating between classes, and high accuracy and precision in estimating evolutionary parameters, with optimal performance achieved by its neural network architecture. Application of CLOUDe to empirical data from Drosophila suggests that deletion primarily targets genes with unique functions, with further analysis showing these functions to be enriched for protein deubiquitination. Thus, CLOUDe represents a key advance in learning about the role of gene deletion in functional evolution and adaptation. Availability and implementationCLOUDe is freely available on GitHub (https://github.com/anddssan/CLOUDe).
more » « less
Spatiotemporal fluctuations of population structure in the Americas revealed by a meta‐analysis of the first decade of archaeogenomes

https://doi.org/10.1002/ajpa.24673

Campelo dos Santos, Andre Luiz; Lavalle Sullasi, Henry Socrates; Gokcumen, Omer; Lindo, John; DeGiorgio, Michael (December 2022, American Journal of Biological Anthropology)

Abstract ObjectivesSince 2010, genome‐wide data from hundreds of ancient Native Americans have contributed to the understanding of Americas' prehistory. However, these samples have never been studied as a single dataset, and distinct relationships among themselves and with present‐day populations may have never come to light. Here, we reassess genomic diversity and population structure of 223 ancient Native Americans published between 2010 and 2019. Materials and MethodsThe genomic data from ancient Americas was merged with a worldwide reference panel of 278 present‐day genomes from the Simons Genome Diversity Project and then analyzed through ADMIXTURE,D‐statistics, PCA, t‐SNE, and UMAP. ResultsWe find largely similar population structures in ancient and present‐day Americas. However, the population structure of contemporary Native Americans, traced here to at least 10,000 years before present, is noticeably less diverse than their ancient counterparts, a possible outcome of the European contact. Additionally, in the past there were greater levels of population structure in North than in South America, except for ancient Brazil, which harbors comparatively high degrees of structure. Moreover, we find a component of genetic ancestry in the ancient dataset that is closely related to that of present‐day Oceanic populations but does not correspond to the previously reported Australasian signal. Lastly, we report an expansion of the Ancient Beringian ancestry, previously reported for only one sample. DiscussionOverall, our findings support a complex scenario for the settlement of the Americas, accommodating the occurrence of founder effects and the emergence of ancestral mixing events at the regional level.
more » « less
BalLeRMix +: mixture model approaches for robust joint identification of both positive selection and long-term balancing selection

https://doi.org/10.1093/bioinformatics/btab720

Cheng, Xiaoheng; DeGiorgio, Michael; Schwartz, ed., Russell (October 2021, Bioinformatics)

Abstract SummaryThe growing availability of genomewide polymorphism data has fueled interest in detecting diverse selective processes affecting population diversity. However, no model-based approaches exist to jointly detect and distinguish the two complementary processes of balancing and positive selection. We extend the BalLeRMixB-statistic framework described in Cheng and DeGiorgio (2020) for detecting balancing selection and present BalLeRMix+, which implements five B statistic extensions based on mixture models to robustly identify both types of selection. BalLeRMix+ is implemented in Python and computes the composite likelihood ratios and associated model parameters for each genomic test position. Availability and implementationBalLeRMix+ is freely available at https://github.com/bioXiaoheng/BallerMixPlus. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
Properties and unbiased estimation of F - and D -statistics in samples containing related and inbred individuals

https://doi.org/10.1093/genetics/iyab090

Mughal, Mehreen R.; DeGiorgio, Michael; Browning, ed., S. (July 2021, Genetics)

Abstract The Patterson F- and D-statistics are commonly used measures for quantifying population relationships and for testing hypotheses about demographic history. These statistics make use of allele frequency information across populations to infer different aspects of population history, such as population structure and introgression events. Inclusion of related or inbred individuals can bias such statistics, which may often lead to the filtering of such individuals. Here, we derive statistical properties of the F- and D-statistics, including their biases due to the inclusion of related or inbred individuals, their variances, and their corresponding mean squared errors. Moreover, for those statistics that are biased, we develop unbiased estimators and evaluate the variances of these new quantities. Comparisons of the new unbiased statistics to the originals demonstrates that our newly derived statistics often have lower error across a wide population parameter space. Furthermore, we apply these unbiased estimators using several global human populations with the inclusion of related individuals to highlight their application on an empirical dataset. Finally, we implement these unbiased estimators in open-source software package funbiased for easy application by the scientific community.
more » « less
Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics

https://doi.org/10.1093/molbev/msad157

Arnab, Sandipan Paul; Amin, Md Ruhul; DeGiorgio, Michael (July 2023, Molecular Biology and Evolution)
Kim, Yuseob (Ed.)
Abstract Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data.
more » « less
Full Text Available
Predicting Gene Expression Divergence between Single-Copy Orthologs in Two Species

https://doi.org/10.1093/gbe/evad078

Piya, Antara Anika; DeGiorgio, Michael; Assis, Raquel (May 2023, Genome Biology and Evolution)
Yi, Soojin (Ed.)
Abstract Predicting gene expression divergence is integral to understanding the emergence of new biological functions and associated traits. Whereas several sophisticated methods have been developed for this task, their applications are either limited to duplicate genes or require expression data from more than two species. Thus, here we present PredIcting eXpression dIvergence (PiXi), the first machine learning framework for predicting gene expression divergence between single-copy orthologs in two species. PiXi models gene expression evolution as an Ornstein-Uhlenbeck process, and overlays this model with multi-layer neural network (NN), random forest, and support vector machine architectures for making predictions. It outputs the predicted class “conserved” or “diverged” for each pair of orthologs, as well as their predicted expression optima in the two species. We show that PiXi has high power and accuracy in predicting gene expression divergence between single-copy orthologs, as well as high accuracy and precision in estimating their expression optima in the two species, across a wide range of evolutionary scenarios, with the globally best performance achieved by a multi-layer NN. Moreover, application of our best-performing PiXi predictor to empirical gene expression data from single-copy orthologs residing at different loci in two species of Drosophila reveals that approximately 23% underwent expression divergence after positional relocation. Further analysis shows that several of these “diverged” genes are involved in the electron transport chain of the mitochondrial membrane, suggesting that new chromatin environments may impact energy production in Drosophila. Thus, by providing a toolkit for predicting gene expression divergence between single-copy orthologs in two species, PiXi can shed light on the origins of novel phenotypes across diverse biological processes and study systems.
more » « less
Full Text Available
Genomic evidence for adaptation to tuberculosis in the Andes before European contact

https://doi.org/10.1016/j.isci.2023.106034

Joseph, Sophie K.; Migliore, Nicola Rambaldi; Olivieri, Anna; Torroni, Antonio; Owings, Amanda C.; DeGiorgio, Michael; Ordóñez, Wladimir Galarza; Aguilú, J.J. Ortiz; González-Andrade, Fabricio; Achilli, Alessandro; et al (February 2023, iScience)

Full Text Available
Genomic evidence for ancient human migration routes along South America's Atlantic coast

https://doi.org/10.1098/rspb.2022.1078

Campelo dos Santos, Andre Luiz; Owings, Amanda; Sullasi, Henry Socrates; Gokcumen, Omer; DeGiorgio, Michael; Lindo, John (November 2022, Proceedings of the Royal Society B: Biological Sciences)

An increasing body of archaeological and genomic evidence has hinted at a complex settlement process of the Americas by humans. This is especially true for South America, where unexpected ancestral signals have raised perplexing scenarios for the early migrations into different regions of the continent. Here, we present ancient human genomes from the archaeologically rich Northeast Brazil and compare them to ancient and present-day genomic data. We find a distinct relationship between ancient genomes from Northeast Brazil, Lagoa Santa, Uruguay and Panama, representing evidence for ancient migration routes along South America's Atlantic coast. To further add to the existing complexity, we also detect greater Denisovan than Neanderthal ancestry in ancient Uruguay and Panama individuals. Moreover, we find a strong Australasian signal in an ancient genome from Panama. This work sheds light on the deep demographic history of eastern South America and presents a starting point for future fine-scale investigations on the regional level.
more » « less
Full Text Available
The roles of balancing selection and recombination in the evolution of rattlesnake venom

https://doi.org/10.1038/s41559-022-01829-5

Schield, Drew R.; Perry, Blair W.; Adams, Richard H.; Holding, Matthew L.; Nikolakis, Zachary L.; Gopalan, Siddharth S.; Smith, Cara F.; Parker, Joshua M.; Meik, Jesse M.; DeGiorgio, Michael; et al (September 2022, Nature Ecology & Evolution)

Full Text Available
A spatially aware likelihood test to detect sweeps from haplotype distributions

https://doi.org/10.1371/journal.pgen.1010134

DeGiorgio, Michael; Szpiech, Zachary A. (April 2022, PLOS Genetics)
Buerkle, Alex (Ed.)
The inference of positive selection in genomes is a problem of great interest in evolutionary genomics. By identifying putative regions of the genome that contain adaptive mutations, we are able to learn about the biology of organisms and their evolutionary history. Here we introduce a composite likelihood method that identifies recently completed or ongoing positive selection by searching for extreme distortions in the spatial distribution of the haplotype frequency spectrum along the genome relative to the genome-wide expectation taken as neutrality. Furthermore, the method simultaneously infers two parameters of the sweep: the number of sweeping haplotypes and the “width” of the sweep, which is related to the strength and timing of selection. We demonstrate that this method outperforms the leading haplotype-based selection statistics, though strong signals in low-recombination regions merit extra scrutiny. As a positive control, we apply it to two well-studied human populations from the 1000 Genomes Project and examine haplotype frequency spectrum patterns at the LCT and MHC loci. We also apply it to a data set of brown rats sampled in NYC and identify genes related to olfactory perception. To facilitate use of this method, we have implemented it in user-friendly open source software.
more » « less
Full Text Available

« Prev Next »