skip to main content


Title: The Role of Co‐Occurrence Statistics in Developing Semantic Knowledge
Abstract

The organization of our knowledge about the world into an interconnected network of concepts linked by relations profoundly impacts many facets of cognition, including attention, memory retrieval, reasoning, and learning. It is therefore crucial to understand how organized semantic representations are acquired. The present experiment investigated the contributions of readily observable environmental statistical regularities to semantic organization in childhood. Specifically, we investigated whether co‐occurrence regularities with which entities or their labels more reliably occur together than with others (a) contribute to relations between concepts independently and (b) contribute to relations between concepts belonging to the same taxonomic category. Using child‐directed speech corpora to estimate reliable co‐occurrences between labels for familiar items, we constructed triads consisting of a target, a related distractor, and an unrelated distractor in which targets and related distractors consistently co‐occurred (e.g., sock‐foot), belonged to the same taxonomic category (e.g., sock‐coat), or both (e.g., sock‐shoe). We used an implicit, eye‐gaze measure of relations between concepts based on the degree to which children (N = 72, age 4–7 years) looked at related versus unrelated distractors when asked to look for a target. The results indicated that co‐occurrence both independently contributes to relations between concepts and contributes to relations between concepts belonging to the same taxonomic category. These findings suggest that sensitivity to the regularity with which different entities co‐occur in children's environments shapes the organization of semantic knowledge during development. Implications for theoretical accounts and empirical investigations of semantic organization are discussed.

 
more » « less
Award ID(s):
1918259
NSF-PAR ID:
10245442
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Cognitive Science
Volume:
44
Issue:
9
ISSN:
0364-0213
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Naming a picture is more difficult in the context of a taxonomically-related picture. Disagreement exists on whether non-taxonomic relations, e.g., associations, have similar or different effects on picture naming. Past work has reported facilitation, interference and null results but with inconsistent methodologies. We paired the same target word (e.g., cow) with unrelated (pen), taxonomically-related (bear), and associatively-related (milk) items in different blocks, as participants repeatedly named one of the two pictures in randomized order. Significant interference was uncovered for the same target item in the taxonomic vs. unrelated and associative blocks. There was no robust evidence of interference in the associative blocks. If anything, evidence suggested that associatively-related items marginally facilitated production. This finding suggests that taxonomic and associative relations have different effects on picture naming and has implications for theoretical models of lexical selection and, more generally, for the computations involved in mapping semantic features to lexical items. 
    more » « less
  2. Making the most of biodiversity data requires linking observations of biological species from multiple sources both efficiently and accurately (Bisby 2000, Franz et al. 2016). Aggregating occurrence records using taxonomic names and synonyms is computationally efficient but known to experience significant limitations on accuracy when the assumption of one-to-one relationships between names and biological entities breaks down (Remsen 2016, Franz and Sterner 2018). Taxonomic treatments and checklists provide authoritative information about the correct usage of names for species, including operational representations of the meanings of those names in the form of range maps, reference genetic sequences, or diagnostic traits. They increasingly provide taxonomic intelligence in the form of precise description of the semantic relationships between different published names in the literature. Making this authoritative information Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) would be a transformative advance for biodiversity data sharing and help drive adoption and novel extensions of existing standards such as the Taxonomic Concept Schema and the OpenBiodiv Ontology (Kennedy et al. 2006, Senderov et al. 2018). We call for the greater, global Biodiversity Information Standards (TDWG) and taxonomy community to commit to extending and expanding on how FAIR applies to biodiversity data and include practical targets and criteria for the publication and digitization of taxonomic concept representations and alignments in taxonomic treatments, checklists, and backbones. As a motivating case, consider the abundantly sampled North American deer mouse— Peromyscus maniculatus (Wagner 1845)—which was recently split from one continental species into five more narrowly defined forms, so that the name P. maniculatus is now only applied east of the Mississippi River (Bradley et al. 2019, Greenbaum et al. 2019). That single change instantly rendered ambiguous ~7% of North American mammal records in the Global Biodiversity Information Facility (n=242,663, downloaded 2021-06-04; GBIF.org 2021) and ⅓ of all National Ecological Observatory Network (NEON) small mammal samples (n=10,256, downloaded 2021-06-27). While this type of ambiguity is common in name-based databases when species are split, the example of P. maniculatus is particularly striking for its impact upon biological questions ranging from hantavirus surveillance in North America to studies of climate change impacts upon rodent life-history traits. Of special relevance to NEON sampling is recent evidence suggesting deer mice potentially transmit SARS-CoV-2 (Griffin et al. 2021). Automating the updating of occurrence records in such cases and others will require operational representations of taxonomic concepts—e.g., range maps, reference sequences, and diagnostic traits—that are FAIR in addition to taxonomic concept alignment information (Franz and Peet 2009). Despite steady progress, it remains difficult to find, access, and reuse authoritative information about how to apply taxonomic names even when it is already digitized. It can also be difficult to tell without manual inspection whether similar types of concept representations derived from multiple sources, such as range maps or reference sequences selected from different research articles or checklists, are in fact interoperable for a particular application. The issue is therefore different from important ongoing efforts to digitize trait information in species circumscriptions, for example, and focuses on how already digitized knowledge can best be packaged to inform human experts and artifical intelligence applications (Sterner and Franz 2017). We therefore propose developing community guidelines and criteria for FAIR taxonomic concept representations as "semantic artefacts" of general relevance to linked open data and life sciences research (Le Franc et al. 2020). 
    more » « less
  3. We provide an overview and update on initiatives and approaches to add taxonomic data intelligence to distributed biodiversity knowledge networks. "Taxonomic intelligence" for biodiversity data is defined here as the ability to identify and renconcile source-contextualized taxonomic name-to-meaning relationships (Remsen 2016). We review the scientific opportunities, as well as information-technological and socio-economic pathways - both existing and envisioned - to embed de-centralized taxonomic data intelligence into the biodiversity data publication and knowledge intedgration processes. We predict that the success of this project will ultimately rest on our ability to up-value the roles and recognition of systematic expertise and experts in large, aggregated data environments. We will argue that these environments will need to adhere to criteria for responsible data science and interests of coherent communities of practice (Wenger 2000, Stoyanovich et al. 2017). This means allowing for fair, accountable, and transparent representation and propagation of evolving systematic knowledge and enduring or newly apparent conflict in systematic perspective (Sterner and Franz 2017, Franz and Sterner 2018, Sterner et al. 2019). We will demonstrate in principle and through concrete use cases, how to de-centralize systematic knowledge while maintaining alignments between congruent or concflicting taxonomic concept labels (Franz et al. 2016a, Franz et al. 2016b, Franz et al. 2019). The suggested approach uses custom-configured logic representation and reasoning methods, based on the Region Connection Calculus (RCC-5) alignment language. The approach offers syntactic consistency and semantic applicability or scalability across a wide range of biodiversity data products, ranging from occurrence records to phylogenomic trees. We will also illustrate how this kind of taxonomic data intelligence can be captured and propagated through existing or envisioned metadata conventions and standards (e.g., Senderov et al. 2018). Having established an intellectual opportunity, as well as a technical solution pathway, we turn to the issue of developing an implementation and adoption strategy. Which biodiversity data environments are currently the most taxonomically intelligent, and why? How is this level of taxonomic data intelligence created, maintained, and propagated outward? How are taxonomic data intelligence services motivated or incentivized, both at the level of individuals and organizations? Which "concerned entities" within the greater biodiversity data publication enterprise are best positioned to promote such services? Are the most valuable lessons for biodiversity data science "hidden" in successful social media applications? What are good, feasible, incremental steps towards improving taxonomic data intelligence for a diversity of data publishers? 
    more » « less
  4. Abstract

    Ecological factors contributing to depth-related diversification of marine Thaumarchaeota populations remain largely unresolved. To investigate the role of potential microbial associations in shaping thaumarchaeal ecotype diversification, we examined co-occurrence relationships in a community composition dataset (16S rRNA V4-V5 region) collected as part of a 2-year time series in coastal Monterey Bay. Ecotype groups previously defined based on functional gene diversity—water column A (WCA), water column B (WCB) and Nitrosopumilus-like clusters—were recovered in the thaumarchaeal 16S rRNA gene phylogeny. Networks systematically reflected depth-related patterns in the abundances of ecotype populations, suggesting thaumarchaeal ecotypes as keystone members of the microbial community below the euphotic zone. Differential environmental controls on the ecotype populations were further evident in subnetwork modules showing preferential co-occurrence of OTUs belonging to the same ecotype cluster. Correlated abundances of Thaumarchaeota and heterotrophic bacteria (e.g., Bacteroidetes, Marinimicrobia and Gammaproteobacteria) indicated potential reciprocal interactions via dissolved organic matter transformations. Notably, the networks recovered ecotype-specific associations between thaumarchaeal and Nitrospina OTUs. Even at depths where WCB-like Thaumarchaeota dominated, Nitrospina OTUs were found to preferentially co-occur with WCA-like and Nitrosopumilus-like thaumarchaeal OTUs, highlighting the need to investigate the ecological implications of the composition of nitrifier assemblages in marine waters.

     
    more » « less
  5. Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both spoken language (e.g., human textual annotations) and unspoken language (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design.

    To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken languages, we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.

     
    more » « less