skip to main content


Title: HIVE-4-MAT: Advancing the Ontology Infrastructure for Materials Science
This paper introduces Helping Interdisciplinary Vocabulary Engineering for Materials Science (HIVE-4-MAT), an automatic linked data ontology application. The paper provides contextual background for materials science, shared ontology infrastructures, and knowledge extraction applications. HIVE-4-MAT's three key features are reviewed: 1) Vocabulary browsing, 2) Term search and selection, and 3) Knowledge Extraction/Indexing, as well as the basics of named entity recognition (NER). The discussion elaborates on the importance of ontology infrastructures and steps taken to enhance knowledge extraction. The conclusion highlights next steps surveying the ontology landscape, including NER work as a step toward relation extraction (RE), and support for better ontologies.  more » « less
Award ID(s):
1940239
NSF-PAR ID:
10298006
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Garoufallou, E; Ovalle-Perandones, M.A.
Date Published:
Journal Name:
Metadata and Semantic Research. MTSR 2020. Communications in Computer and Information Science
Volume:
1335
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Purpose The output of academic literature has increased significantly due to digital technology, presenting researchers with a challenge across every discipline, including materials science, as it is impossible to manually read and extract knowledge from millions of published literature. The purpose of this study is to address this challenge by exploring knowledge extraction in materials science, as applied to digital scholarship. An overriding goal is to help inform readers about the status knowledge extraction in materials science. Design/methodology/approach The authors conducted a two-part analysis, comparing knowledge extraction methods applied materials science scholarship, across a sample of 22 articles; followed by a comparison of HIVE-4-MAT, an ontology-based knowledge extraction and MatScholar, a named entity recognition (NER) application. This paper covers contextual background, and a review of three tiers of knowledge extraction (ontology-based, NER and relation extraction), followed by the research goals and approach. Findings The results indicate three key needs for researchers to consider for advancing knowledge extraction: the need for materials science focused corpora; the need for researchers to define the scope of the research being pursued, and the need to understand the tradeoffs among different knowledge extraction methods. This paper also points to future material science research potential with relation extraction and increased availability of ontologies. Originality/value To the best of the authors’ knowledge, there are very few studies examining knowledge extraction in materials science. This work makes an important contribution to this underexplored research area. 
    more » « less
  2. This paper explores computational, semantic labeling for scholarly big data in materials science. We report on a baseline comparative analysis involving ontology-based automatic indexing with the Helping Interdisciplinary Vocabulary Engineering (HIVE-4-MAT) application, using the RAKE algorithm, and the MATScholar system, which uses named entity recognition (NER), supported by an RNN (Recursive Neural Network). Results demonstrate that ontology-based automatic indexing requires less preparation time and provides useful output supporting recall; while NER/RNN requires greater preparation, but produces more precise labels that are likely better for deep learning. 
    more » « less
  3. Researchers across nearly every discipline seek to leverage ontologies for knowledge discovery and computational tasks; yet, the number of machine readable materials science ontologies is limited. The work presented in this paper explores the Processing, Structure, Properties and Performance (PSPP) framework for accelerating the development of materials science ontologies. We pursue a case study framed by the creation of an Aerogel ontology and a Battery Cathode ontology and demonstrate the Helping Interdisciplinary Vocabulary Engineer for Materials Science (HIVE4MAT) as a proof of concept showing PSPP relationships. The paper includes background context covering materials science, the PSPP framework, and faceted analysis for ontologies. We report our research objectives, methods, research procedures, and results. The findings indicate that the PSPP framework offers a rubric that may help guide and potentially accelerate ontology development. 
    more » « less
  4. Researchers across nearly every discipline seek to leverage ontologies for knowledge discovery and computational tasks; yet, the number of machine readable materials science ontologies is limited. The work presented in this paper explores the Processing, Structure, Properties and Performance (PSPP) framework for accelerating the development of materials science ontologies. We pursue a case study framed by the creation of an Aerogel ontology and a Battery Cathode ontology and demonstrate the Helping Interdisciplinary Vocabulary Engineer for Materials Science (HIVE4MAT) as a proof of concept showing PSPP relationships. The paper includes background context covering materials science, the PSPP framework, and faceted analysis for ontologies. We report our research objectives, methods, research procedures, and results. The findings indicate that the PSPP framework offers a rubric that may help guide and potentially accelerate ontology development. 
    more » « less
  5. Abstract

    Scientific literature is one of the most significant resources for sharing knowledge. Researchers turn to scientific literature as a first step in designing an experiment. Given the extensive and growing volume of literature, the common approach of reading and manually extracting knowledge is too time consuming, creating a bottleneck in the research cycle. This challenge spans nearly every scientific domain. For the materials science, experimental data distributed across millions of publications are extremely helpful for predicting materials properties and the design of novel materials. However, only recently researchers have explored computational approaches for knowledge extraction primarily for inorganic materials. This study aims to explore knowledge extraction for organic materials. We built a research dataset composed of 855 annotated and 708,376 unannotated sentences drawn from 92,667 abstracts. We used named‐entity‐recognition (NER) with BiLSTM‐CNN‐CRF deep learning model to automatically extract key knowledge from literature. Early‐phase results show a high potential for automated knowledge extraction. The paper presents our findings and a framework for supervised knowledge extraction that can be adapted to other scientific domains.

     
    more » « less