skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Simmons, Mark"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. While there has been substantial progress in text comprehension through simple factoid question answering, more holistic comprehension of a discourse still presents a major challenge (Dunietz et al., 2020). Someone critically reflecting on a text as they read it will pose curiosity-driven, often open-ended questions, which reflect deep understanding of the content and require complex reasoning to answer (Ko et al., 2020; Westera et al., 2020). A key challenge in building and evaluating models for this type of discourse comprehension is the lack of annotated data, especially since collecting answers to such questions requires high cognitive load for annotators. This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents, viewing these questions through the lens of discourse. The resulting corpus, DCQA (Discourse Comprehension by Question Answering), captures both discourse and semantic links between sentences in the form of free-form, open-ended questions. On an evaluation set that we annotated on questions from Ko et al. (2020), we show that DCQA provides valuable supervision for answering open-ended questions. We additionally design pre-training methods utilizing existing question-answering resources, and use synthetic data to accommodate unanswerable questions. 
    more » « less
  2. Murphy, William (Ed.)
    Abstract DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the “no intralocus-recombination” assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets. 
    more » « less
  3. Abstract Angiosperms are the cornerstone of most terrestrial ecosystems and human livelihoods1,2. A robust understanding of angiosperm evolution is required to explain their rise to ecological dominance. So far, the angiosperm tree of life has been determined primarily by means of analyses of the plastid genome3,4. Many studies have drawn on this foundational work, such as classification and first insights into angiosperm diversification since their Mesozoic origins5–7. However, the limited and biased sampling of both taxa and genomes undermines confidence in the tree and its implications. Here, we build the tree of life for almost 8,000 (about 60%) angiosperm genera using a standardized set of 353 nuclear genes8. This 15-fold increase in genus-level sampling relative to comparable nuclear studies9provides a critical test of earlier results and brings notable change to key groups, especially in rosids, while substantiating many previously predicted relationships. Scaling this tree to time using 200 fossils, we discovered that early angiosperm evolution was characterized by high gene tree conflict and explosive diversification, giving rise to more than 80% of extant angiosperm orders. Steady diversification ensued through the remaining Mesozoic Era until rates resurged in the Cenozoic Era, concurrent with decreasing global temperatures and tightly linked with gene tree conflict. Taken together, our extensive sampling combined with advanced phylogenomic methods shows the deep history and full complexity in the evolution of a megadiverse clade. 
    more » « less