skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scholarly Big Data: Computational Approaches to Semantic Labeling in Materials Science
This paper explores computational, semantic labeling for scholarly big data in materials science. We report on a baseline comparative analysis involving ontology-based automatic indexing with the Helping Interdisciplinary Vocabulary Engineering (HIVE-4-MAT) application, using the RAKE algorithm, and the MATScholar system, which uses named entity recognition (NER), supported by an RNN (Recursive Neural Network). Results demonstrate that ontology-based automatic indexing requires less preparation time and provides useful output supporting recall; while NER/RNN requires greater preparation, but produces more precise labels that are likely better for deep learning.  more » « less
Award ID(s):
1940239
PAR ID:
10185098
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEEACM Joint Conference on Digital Libraries JCDL
ISSN:
2575-7865
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Garoufallou, E; Ovalle-Perandones, M.A. (Ed.)
    This paper introduces Helping Interdisciplinary Vocabulary Engineering for Materials Science (HIVE-4-MAT), an automatic linked data ontology application. The paper provides contextual background for materials science, shared ontology infrastructures, and knowledge extraction applications. HIVE-4-MAT's three key features are reviewed: 1) Vocabulary browsing, 2) Term search and selection, and 3) Knowledge Extraction/Indexing, as well as the basics of named entity recognition (NER). The discussion elaborates on the importance of ontology infrastructures and steps taken to enhance knowledge extraction. The conclusion highlights next steps surveying the ontology landscape, including NER work as a step toward relation extraction (RE), and support for better ontologies. 
    more » « less
  2. Scientific literature analysis needs fine-grained named entity recognition (NER) to provide a wide range of information for scientific discovery. For example, chemistry research needs to study dozens to hundreds of distinct, fine-grained entity types, making consistent and accurate annotation difficult even for crowds of domain experts. On the other hand, domain-specific ontologies and knowledge bases (KBs) can be easily accessed, constructed, or integrated, which makes distant supervision realistic for fine-grained chemistry NER. In distant supervision, training labels are generated by matching mentions in a document with the concepts in the knowledge bases (KBs). However, this kind of KB-matching suffers from two major challenges: incomplete annotation and noisy annotation. We propose ChemNER, an ontology-guided, distantly-supervised method for fine-grained chemistry NER to tackle these challenges. It leverages the chemistry type ontology structure to generate distant labels with novel methods of flexible KB-matching and ontology-guided multi-type disambiguation. It significantly improves the distant label generation for the subsequent sequence labeling model training. We also provide an expert-labeled, chemistry NER dataset with 62 fine-grained chemistry types (e.g., chemical compounds and chemical reactions). Experimental results show that ChemNER is highly effective, outperforming substantially the state-of-the-art NER methods (with .25 absolute F1 score improvement). 
    more » « less
  3. Chemical patents are an essential source of information about novel chemicals and chemical reactions. However, with the increasing volume of such patents, mining information about these chemicals and chemical reactions has become a time-intensive and laborious endeavor. In this study, we present a system to extract chemical reaction events from patents automatically. Our approach consists of two steps: 1) named entity recognition (NER)—the automatic identification of chemical reaction parameters from the corresponding text, and 2) event extraction (EE)—the automatic classifying and linking of entities based on their relationships to each other. For our NER system, we evaluate bidirectional long short-term memory (BiLSTM)-based and bidirectional encoder representations from transformer (BERT)-based methods. For our EE system, we evaluate BERT-based, convolutional neural network (CNN)-based, and rule-based methods. We evaluate our NER and EE components independently and as an end-to-end system, reporting the precision, recall, and F 1 score. Our results show that the BiLSTM-based method performed best at identifying the entities, and the CNN-based method performed best at extracting events. 
    more » « less
  4. d public health. For such high-impact areas, accurately capturing relevant entities at a more granular level is critical, as this information influences real-world processes. On the other hand, training NER models for a specific domain without handcrafted features requires an extensive amount of labeled data, which is expensive in human effort and time. In this study, we employ distant supervision utilizing a domain-specific ontology to reduce the need for human labor and train models incorporating domain-specific (e.g., drug use) external knowledge to recognize domain specific entities. We capture entities related the drug use and their trends in government epidemiology reports, with an improvement of 8% in F1-score. 
    more » « less
  5. Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become promising and important on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a novel block-based pruning approach and compiler optimizations to accelerate RNN inference on mobile devices. Our proposed RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Experimental results demonstrate that RTMobile can significantly outperform existing RNN hardware acceleration methods in terms of both inference accuracy and time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy efficiency by 40x while maintaining the same inference time. 
    more » « less