NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

Xiong, Zidi; Chen, Shan; Qi, Zhenting; Lakkaraju, Himabindu (December 2025, Advances in Neural Information Processing Systems)

Free, publicly-accessible full text available December 1, 2026
Recurrent evolution and selection shape structural diversity at the amylase locus

https://doi.org/10.1038/s41586-024-07911-1

Bolognini, Davide; Halgren, Alma; Lou, Runyang Nicolas; Raveane, Alessandro; Rocha, Joana L; Guarracino, Andrea; Soranzo, Nicole; Chin, Chen-Shan; Garrison, Erik; Sudmant, Peter H (October 2024, Nature)

Abstract The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations¹. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake², although evidence of recent selection is lacking^3,4. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately 5,600 contemporary and ancient humans, we resolve the diversity and evolutionary history of structural variation at the amylase locus. We find that amylase genes have higher copy numbers in agricultural populations than in fishing, hunting and pastoral populations. We identify 28 distinct amylase structural architectures and demonstrate that nearly identical structures have arisen recurrently on different haplotype backgrounds throughout recent human history.AMY1andAMY2Agenes each underwent multiple duplication/deletion events with mutation rates up to more than 10,000-fold the single-nucleotide polymorphism mutation rate, whereasAMY2Bgene duplications share a single origin. Using a pangenome-based approach, we infer structural haplotypes across thousands of humans identifying extensively duplicated haplotypes at higher frequency in modern agricultural populations. Leveraging 533 ancient human genomes, we find that duplication-containing haplotypes (with more gene copies than the ancestral haplotype) have rapidly increased in frequency over the past 12,000 years in West Eurasians, suggestive of positive selection. Together, our study highlights the potential effects of the agricultural revolution on human genomes and the importance of structural variation in human adaptation.
more » « less
Full Text Available
The TRIPOD-LLM reporting guideline for studies using large language models

https://doi.org/10.1038/s41591-024-03425-5

Gallifant, Jack; Afshar, Majid; Ameen, Saleem; Aphinyanaphongs, Yindalon; Chen, Shan; Cacciamani, Giovanni; Demner-Fushman, Dina; Dligach, Dmitriy; Daneshjou, Roxana; Fernandes, Chrystinne; et al (January 2025, Nature Medicine)

Full Text Available
Precursor-mediated in situ growth of hierarchical N-doped graphene nanofibers confining nickel single atoms for CO ₂ electroreduction

https://doi.org/10.1073/pnas.2219043120

Wang, Huan; Li, Youzeng; Wang, Maoyu; Chen, Shan; Yao, Meng; Chen, Jialei; Liao, Xuelong; Zhang, Yiwen; Lu, Xuan; Matios, Edward; et al (April 2023, Proceedings of the National Academy of Sciences)

Despite the various strategies for achieving metal–nitrogen–carbon (M–N–C) single-atom catalysts (SACs) with different microenvironments for electrochemical carbon dioxide reduction reaction (CO 2 RR), the synthesis–structure–performance correlation remains elusive due to the lack of well-controlled synthetic approaches. Here, we employed Ni nanoparticles as starting materials for the direct synthesis of nickel (Ni) SACs in one spot through harvesting the interaction between metallic Ni and N atoms in the precursor during the chemical vapor deposition growth of hierarchical N-doped graphene fibers. By combining with first-principle calculations, we found that the Ni-N configuration is closely correlated to the N contents in the precursor, in which the acetonitrile with a high N/C ratio favors the formation of Ni-N 3 , while the pyridine with a low N/C ratio is more likely to promote the evolution of Ni-N 2 . Moreover, we revealed that the presence of N favors the formation of H-terminated edge of sp 2 carbon and consequently leads to the formation of graphene fibers consisting of vertically stacked graphene flakes, instead of the traditional growth of carbon nanotubes on Ni nanoparticles. With a high capability in balancing the *COOH formation and *CO desorption, the as-prepared hierarchical N-doped graphene nanofibers with Ni-N 3 sites exhibit a superior CO 2 RR performance compared to that with Ni-N 2 and Ni-N 4 ones.
more » « less
Full Text Available
Complex genetic variation in nearly complete human genomes

https://doi.org/10.1101/2024.09.24.614721

Logsdon, Glennis A; Ebert, Peter; Audano, Peter A; Loftus, Mark; Porubsky, David; Ebler, Jana; Yilmaz, Feyza; Hallast, Pille; Prodanov, Timofey; Yoo, DongAhn; et al (September 2024, Nature)

Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
more » « less
Full Text Available
Provable Security Analysis of FIDO2

https://doi.org/10.1007/978-3-030-84252

Barbosa, Manuel; Boldyreva, Alexandra; Chen, Shan; Warinschi, Bogdan (January 2021, CRYPTO 2021: Advances in Cryptology – CRYPTO 2021)
Malkin, T. (Ed.)
Full Text Available
Ribbon: intuitive visualization for complex genomic variation

https://doi.org/10.1093/bioinformatics/btaa680

Nattestad, Maria; Aboukhalil, Robert; Chin, Chen-Shan; Schatz, Michael C (August 2020, Bioinformatics)
Birol, Inanc (Ed.)
Abstract Summary Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. Availability and implementation Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
The complete sequence and comparative analysis of ape sex chromosomes

https://doi.org/10.1038/s41586-024-07473-2

Makova, Kateryna D; Pickett, Brandon D; Harris, Robert S; Hartley, Gabrielle A; Cechova, Monika; Pal, Karol; Nurk, Sergey; Yoo, DongAhn; Li, Qiuhui; Hebbar, Prajna; et al (June 2024, Nature)

Apes possess two sex chromosomes—the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility¹. The X chromosome is vital for reproduction and cognition². Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements—owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
more » « less
Full Text Available
A complete reference genome improves analysis of human genetic variation

https://doi.org/10.1126/science.abl3533

Aganezov, Sergey; Yan, Stephanie M.; Soto, Daniela C.; Kirsche, Melanie; Zarate, Samantha; Avdeyev, Pavel; Taylor, Dylan J.; Shafin, Kishwar; Shumate, Alaina; Xiao, Chunlin; et al (April 2022, Science)

INTRODUCTION One of the central applications of the human reference genome has been to serve as a baseline for comparison in nearly all human genomic studies. Unfortunately, many difficult regions of the reference genome have remained unresolved for decades and are affected by collapsed duplications, missing sequences, and other issues. Relative to the current human reference genome, GRCh38, the Telomere-to-Telomere CHM13 (T2T-CHM13) genome closes all remaining gaps, adds nearly 200 million base pairs (Mbp) of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for scientific inquiry. RATIONALE We demonstrate how the T2T-CHM13 reference genome universally improves read mapping and variant identification in a globally diverse cohort. This cohort includes all 3202 samples from the expanded 1000 Genomes Project (1KGP), sequenced with short reads, as well as 17 globally diverse samples sequenced with long reads. By applying state-of-the-art methods for calling single-nucleotide variants (SNVs) and structural variants (SVs), we document the strengths and limitations of T2T-CHM13 relative to its predecessors and highlight its promise for revealing new biological insights within technically challenging regions of the genome. RESULTS Across the 1KGP samples, we found more than 1 million additional high-quality variants genome-wide using T2T-CHM13 than with GRCh38. Within previously unresolved regions of the genome, we identified hundreds of thousands of variants per sample—a promising opportunity for evolutionary and biomedical discovery. T2T-CHM13 improves the Mendelian concordance rate among trios and eliminates tens of thousands of spurious SNVs per sample, including a reduction of false positives in 269 challenging, medically relevant genes by up to a factor of 12. These corrections are in large part due to improvements to 70 protein-coding genes in >9 Mbp of inaccurate sequence caused by falsely collapsed or duplicated regions in GRCh38. Using the T2T-CHM13 genome also yields a more comprehensive view of SVs genome-wide, with a greatly improved balance of insertions and deletions. Finally, by providing numerous resources for T2T-CHM13 (including 1KGP genotypes, accessibility masks, and prominent annotation databases), our work will facilitate the transition to T2T-CHM13 from the current reference genome. CONCLUSION The vast improvements in variant discovery across samples of diverse ancestries position T2T-CHM13 to succeed as the next prevailing reference for human genetics. T2T-CHM13 thus offers a model for the construction and study of high-quality reference genomes from globally diverse individuals, such as is now being pursued through collaboration with the Human Pangenome Reference Consortium. As a foundation, our work underscores the benefits of an accurate and complete reference genome for revealing diversity across human populations. Genomic features and resources available for T2T-CHM13. Comparisons to GRCh38 reveal broad improvements in SNVs, indels, and SVs discovered across diverse human populations by means of short-read (1KGP) and long-read sequencing (LRS). These improvements are due to resolution of complex genomic loci (nonsyntenic and previously unresolved), duplication errors, and discordant haplotypes, including those in medically relevant genes.
more » « less
Full Text Available
Semi-automated assembly of high-quality diploid human reference genomes

https://doi.org/10.1038/s41586-022-05325-5

Jarvis, Erich D.; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R.; Porubsky, David; et al (November 2022, Nature)

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
more » « less
Full Text Available

« Prev Next »

Search for: All records