Scientific literature presents a wellspring of cutting-edge knowledge for materials science, including valuable data (e.g., numerical data from experiment results, material properties and structure). These data are critical for accelerating materials discovery by data-driven machine learning (ML) methods. The challenge is, it is impossible for humans to manually extract and retain this knowledge due to the extensive and growing volume of publications.To this end, we explore a fine-tuned BERT model for extracting knowledge. Our preliminary results show that our fine-tuned Bert model reaches an f-score of 85% for the materials named entity recognition task. The paper covers background, related work, methodology including tuning parameters, and our overall performance evaluation. Our discussion offers insights into our results, and points to directions for next steps.
more »
« less
BERT-ER: Query-specific BERT Entity Representations for Entity Ranking
- Award ID(s):
- 1846017
- PAR ID:
- 10373334
- Date Published:
- Journal Name:
- Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
- Page Range / eLocation ID:
- 1466 to 1477
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the open-domain named entity recognition (NER) prob- lem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly in- complete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework – BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algo- rithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 bench- mark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.more » « less
An official website of the United States government

