Abstract We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We derive a backpropagation optimization scheme that allows us to frame hierarchical NMF as a neural network. We test Neural NMF on a synthetic hierarchical dataset, the 20 Newsgroups dataset, and the MyLymeData symptoms dataset. Numerical results demonstrate that Neural NMF outperforms other hierarchical NMF methods on these data sets and offers better learned hierarchical structure and interpretability of topics.
more »
« less
Extracting hierarchical features of cultural variation using network-based clustering
Abstract High-dimensional datasets on cultural characters contribute to uncovering insights about factors that influence cultural evolution. Because cultural variation in part reflects descent processes with a hierarchical structure – including the descent of populations and vertical transmission of cultural traits – methods designed for hierarchically structured data have potential to find applications in the analysis of cultural variation. We adapt a network-based hierarchical clustering method for use in analysing cultural variation. Given a set of entities, the method constructs a similarity network, hierarchically depicting community structure among them. We illustrate the approach using four datasets: pronunciation variation in the US mid-Atlantic region, folklore variation in worldwide cultures, phonemic variation across worldwide languages and temporal variation in first names in the US. In these examples, the method provides insights into processes that affect cultural variation, uncovering geographic and other influences on observed patterns and cultural characters that make important contributions to them.
more »
« less
- Award ID(s):
- 2116322
- PAR ID:
- 10329253
- Date Published:
- Journal Name:
- Evolutionary Human Sciences
- Volume:
- 4
- ISSN:
- 2513-843X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract ObjectiveSNOMED CT is the largest clinical terminology worldwide. Quality assurance of SNOMED CT is of utmost importance to ensure that it provides accurate domain knowledge to various SNOMED CT-based applications. In this work, we introduce a deep learning-based approach to uncover missing is-a relations in SNOMED CT. Materials and MethodsOur focus is to identify missing is-a relations between concept-pairs exhibiting a containment pattern (ie, the set of words of one concept being a proper subset of that of the other concept). We use hierarchically related containment concept-pairs as positive instances and hierarchically unrelated containment concept-pairs as negative instances to train a model predicting whether an is-a relation exists between 2 concepts with containment pattern. The model is a binary classifier leveraging concept name features, hierarchical features, enriched lexical attribute features, and logical definition features. We introduce a cross-validation inspired approach to identify missing is-a relations among all hierarchically unrelated containment concept-pairs. ResultsWe trained and applied our model on the Clinical finding subhierarchy of SNOMED CT (September 2019 US edition). Our model (based on the validation sets) achieved a precision of 0.8164, recall of 0.8397, and F1 score of 0.8279. Applying the model to predict actual missing is-a relations, we obtained a total of 1661 potential candidates. Domain experts performed evaluation on randomly selected 230 samples and verified that 192 (83.48%) are valid. ConclusionsThe results showed that our deep learning approach is effective in uncovering missing is-a relations between containment concept-pairs in SNOMED CT.more » « less
-
Natural target functions and tasks typically exhibit hierarchical modularity – they can be broken down into simpler sub-functions that are organized in a hierarchy. Such sub-functions have two important features: they have a distinct set of inputs (input-separability) and they are reused as inputs higher in the hierarchy (reusability). Previous studies have established that hierarchically modular neural networks, which are inherently sparse, offer benefits such as learning efficiency, generalization, multi-task learning, and transfer. However, identifying the underlying sub-functions and their hierarchical structure for a given task can be challenging. The high-level question in this work is: if we learn a task using a sufficiently deep neural network, how can we uncover the underlying hierarchy of sub-functions in that task? As a starting point, we examine the domain of Boolean functions, where it is easier to determine whether a task is hierarchically modular. We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference. Finally, we demonstrate that this method can uncover the hierarchical modularity of a wide range of Boolean functions and two vision tasks based on the MNIST digits dataset.more » « less
-
Abstract Morphology remains a primary source of phylogenetic information for many groups of organisms, and the only one for most fossil taxa. Organismal anatomy is not a collection of randomly assembled and independent “parts”, but instead a set of dependent and hierarchically nested entities resulting from ontogeny and phylogeny. How do we make sense of these dependent and at times redundant characters? One promising approach is using ontologies—structured controlled vocabularies that summarize knowledge about different properties of anatomical entities, including developmental and structural dependencies. Here, we assess whether evolutionary patterns can explain the proximity of ontology-annotated characters within an ontology. To do so, we measure phylogenetic information across characters and evaluate if it matches the hierarchical structure given by ontological knowledge—in much the same way as across-species diversity structure is given by phylogeny. We implement an approach to evaluate the Bayesian phylogenetic information (BPI) content and phylogenetic dissonance among ontology-annotated anatomical data subsets. We applied this to data sets representing two disparate animal groups: bees (Hexapoda: Hymenoptera: Apoidea, 209 chars) and characiform fishes (Actinopterygii: Ostariophysi: Characiformes, 463 chars). For bees, we find that BPI is not substantially explained by anatomy since dissonance is often high among morphologically related anatomical entities. For fishes, we find substantial information for two clusters of anatomical entities instantiating concepts from the jaws and branchial arch bones, but among-subset information decreases and dissonance increases substantially moving to higher-level subsets in the ontology. We further applied our approach to address particular evolutionary hypotheses with an example of morphological evolution in miniature fishes. While we show that phylogenetic information does match ontology structure for some anatomical entities, additional relationships and processes, such as convergence, likely play a substantial role in explaining BPI and dissonance, and merit future investigation. Our work demonstrates how complex morphological data sets can be interrogated with ontologies by allowing one to access how information is spread hierarchically across anatomical concepts, how congruent this information is, and what sorts of processes may play a role in explaining it: phylogeny, development, or convergence. [Apidae; Bayesian phylogenetic information; Ostariophysi; Phenoscape; phylogenetic dissonance; semantic similarity.]more » « less
-
Abstract The coronavirus disease 2019 (COVID‐19) pandemic has taken a massive toll on human life worldwide. The case of the United States—the world's largest economy—is particularly noteworthy, since the country suffered a disproportionately larger number of deaths than all other countries during the first year of the pandemic. A careful analysis may shed new light on the multifaceted processes contributing to this failure and help us prepare ourselves not to repeat the same mistakes in the future. Cultural psychology offers unique insights by highlighting mutually reinforcing interactions across collective, cultural, and psychological factors. Here, we review extant evidence and argue that various factors at these disparate levels converged to foster an independent mode of action, which, in turn, undermined effective coping with the infectious disease. The lack of effective political leadership exacerbated the resulting dire state of the country. Drawing on this analysis, we discuss several policy recommendations at collective, cultural, and psychological levels.more » « less
An official website of the United States government

