- Award ID(s):
- 1942591
- NSF-PAR ID:
- 10237391
- Date Published:
- Journal Name:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Page Range / eLocation ID:
- 642 to 652
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Fanfiction presents an opportunity as a data source for research in NLP, education, and social science. However, answering specific research questions with this data is difficult, since fanfiction contains more diverse writing styles than formal fiction. We present a text processing pipeline for fanfiction, with a fo- cus on identifying text associated with characters. The pipeline includes modules for character identification and coreference, as well as the attribution of quotes and narration to those characters. Additionally, the pipeline contains a novel approach to character coreference that uses knowledge from quote attribution to resolve pronouns within quotes. For each module, we evaluate the effectiveness of various approaches on 10 annotated fanfiction stories. This pipeline outperforms tools developed for formal fiction on the tasks of character coreference and quote attribution.more » « less
-
Abstract Reconstructing ancestral states for discrete characters is essential for understanding trait evolution in organisms. However, most existing methods are limited to individual characters and often overlook the hierarchical and interactive nature of traits. Recent advances in phylogenetics now offer the possibility of integrating knowledge from anatomy ontologies to reconstruct multiple discrete character histories. Nonetheless, practical applications that fully harness the potential of these new approaches are still lacking.
This paper introduces
ontophylo , an R package that extends the PARAMO pipeline to address these limitations.Ontophylo enables the reconstruction of phenotypic entities composed of amalgamated characters, such as anatomical regions or entire phenomes. It offers three new applications: (1) reconstruction of evolutionary rates of amalgamated characters using phylogenetic non‐homogeneous Poisson process (pNHPP) that allows modelling rate variation across tree branches and time; (2) reconstruction of morphospace dynamics; and (3) visualization of evolutionary rates on vector images of organisms.Ontophylo incorporates ontological knowledge to facilitate these applications.Benchmarking confirms the accuracy of pNHPP in estimating character rates under different evolutionary scenarios, and example applications demonstrate the utility of
ontophylo in studying morphological evolution in Hymenoptera using simulated data.Ontophylo can be easily integrated with other ontology‐oriented and general‐purpose R packages and offers new opportunities to examine morphological evolution on a phenomic scale using new and legacy data. -
A challenge to understanding biological diversification is accounting for community-scale processes that cause multiple, co-distributed lineages to co-speciate. Such processes predict non-independent, temporally clustered divergences across taxa. Approximate-likelihood Bayesian computation (ABC) approaches to inferring such patterns from comparative genetic data are very sensitive to prior assumptions and often biased toward estimating shared divergences. We introduce a full-likelihood Bayesian approach, ecoevolity, which takes full advantage of information in genomic data. By analytically integrating over gene trees, we are able to directly calculate the likelihood of the population history from genomic data, and efficiently sample the model-averaged posterior via Markov chain Monte Carlo algorithms. Using simulations, we find that the new method is much more accurate and precise at estimating the number and timing of divergence events across pairs of populations than existing approximate-likelihood methods. Our full Bayesian approach also requires several orders of magnitude less computational time than existing ABC approaches. We find that despite assuming unlinked characters (e.g., unlinked single-nucleotide polymorphisms), the new method performs better if this assumption is violated in order to retain the constant characters of whole linked loci. In fact, retaining constant characters allows the new method to robustly estimate the correct number of divergence events with high posterior probability in the face of character-acquisition biases, which commonly plague loci assembled from reduced-representation genomic libraries. We apply our method to genomic data from four pairs of insular populations of Gekko lizards from the Philippines that are not expected to have co-diverged. Despite all four pairs diverging very recently, our method strongly supports that they diverged independently, and these results are robust to very disparate prior assumptions.more » « less
-
Abstract The ability to engage in counterfactual thinking (reason about what else
could have happened) is critical to learning, agency, and social evaluation. However, not much is known about how individual differences in counterfactual reasoning may play a role in children's social evaluations. In the current study, we investigate how prompting children to engage in counterfactual thinking about positive moral actions impacts children's social evaluations. Eighty‐seven 4‐8‐year‐olds were introduced to a character who engaged in a positive moral action (shared a sticker with a friend) and asked about whatelse the character could have done with the sticker (counterfactual simulation). Children were asked to generate either a high number of counterfactuals (five alternative actions) or a low number of counterfactuals (one alternative action). Children were then asked a series of social evaluation questions contrasting that character with one who did not have a choice and had no alternatives (was told to give away the sticker to his friend). Results show that children who generatedselfish counterfactuals were more likely to positively evaluate the character with choice than children who did not generate selfish counterfactuals, suggesting that generating counterfactuals most distant from the chosen action (prosociality) leads children to view prosocial actions more positively. We also found age‐related changes: as children got older, regardless of the type of counterfactuals generated, they were more likely to evaluate the character with choice more positively. These results highlight the importance of counterfactual reasoning in the development of moral evaluations.Research Highlights Older children were more likely to endorse agents who
choose to share over those who do not have a choice.Children who were prompted to generate more counterfactuals were more likely to allocate resources to characters with choice.
Children who generated selfish counterfactuals more positively evaluated agents with choice.
Comparable to theories suggesting children punish willful transgressors more than accidental transgressors, we propose children also consider free will when making positive moral evaluations.
-
Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called “modifiers”. With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using “broader synonym” or “not recommended” annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.more » « less