Exploring Pre-Trained Language Models to Build Knowledge Graph for Metal-Organic Frameworks (MOFs)

An, Yuan; Greenberg, Jane; Hu, Xiaohua; Kalinowski, Alex; Fang, Xiao; Zhao, Xintong; McCLellan, Scott; Uribe-Romo, Fernando J.; Langlois, Kyle; Furst, Jacob; Gomez-Gualdron, Diego A.; Fajardo-Rojas, Fernando; Ardila, Katherine; Saikin, Semion K.; Harper, Corey A.; Daniel, Ron

doi:10.1109/BigData55660.2022.10020568

Citation Details

Exploring Pre-Trained Language Models to Build Knowledge Graph for Metal-Organic Frameworks (MOFs)

Building a knowledge graph is a time-consuming and costly process which often applies complex natural language processing (NLP) methods for extracting knowledge graph triples from text corpora. Pre-trained large Language Models (PLM) have emerged as a crucial type of approach that provides readily available knowledge for a range of AI applications. However, it is unclear whether it is feasible to construct domain-specific knowledge graphs from PLMs. Motivated by the capacity of knowledge graphs to accelerate data-driven materials discovery, we explored a set of state-of-the-art pre-trained general-purpose and domain-specific language models to extract knowledge triples for metal-organic frameworks (MOFs). We created a knowledge graph benchmark with 7 relations for 1248 published MOF synonyms. Our experimental results showed that domain-specific PLMs consistently outperformed the general-purpose PLMs for predicting MOF related triples. The overall benchmarking results, however, show that using the present PLMs to create domain-specific knowledge graphs is still far from being practical, motivating the need to develop more capable and knowledgeable pre-trained language models for particular applications in materials science. more »

Award ID(s):: 2118201

PAR ID:: 10406863

Author(s) / Creator(s):: An, Yuan; Greenberg, Jane; Hu, Xiaohua; Kalinowski, Alex; Fang, Xiao; Zhao, Xintong; McCLellan, Scott; Uribe-Romo, Fernando J.; Langlois, Kyle; Furst, Jacob; Gomez-Gualdron, Diego A.; Fajardo-Rojas, Fernando; Ardila, Katherine; Saikin, Semion K.; Harper, Corey A.; Daniel, Ron

Date Published:: 2022-12-17

Journal Name:: 2022 IEEE International Conference on Big Data (Big Data)

Page Range / eLocation ID:: 3651 to 3658

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/BigData55660.2022.10020568

More Like this