NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chain-of-Factors Paper-Reviewer Matching

https://doi.org/10.1145/3696410.3714708

Zhang, Yu; Shen, Yanzhen; Kang, SeongKu; Chen, Xiusi; Jin, Bowen; Han, Jiawei (April 2025, ACM)

Free, publicly-accessible full text available April 22, 2026
Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains

https://doi.org/10.1609/AAAI.V38I17.29933

Zhang, Yu; Zhang, Yunyi; Shen, Yanzhen; Deng, Yu; Popa, Lucian; Shwartz, Larisa; Zhai, ChengXiang; Han, Jiawei (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)
Wooldridge, Michael J; Dy, Jennifer G; Natarajan, Sriraam (Ed.)
Accurately typing entity mentions from text segments is a fundamental task for various natural language processing applications. Many previous approaches rely on massive human-annotated data to perform entity typing. Nevertheless, collecting such data in highly specialized science and engineering domains (e.g., software engineering and security) can be time-consuming and costly, without mentioning the domain gaps between training and inference data if the model needs to be applied to confidential datasets. In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i.e., those without seed entities). To solve this problem, we propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus using the contextualized representations of pre-trained language models. It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types. Extensive experiments on two datasets covering four domains demonstrate the effectiveness of SEType in comparison with various baselines. Code and data are available at: https://github.com/yuzhimanhua/SEType.
more » « less
Full Text Available
Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

https://doi.org/10.1145/3580305.3599544

Zhang, Yu; Jin, Bowen; Chen, Xiusi; Shen, Yanzhen; Zhang, Yunyi; Meng, Yu; Han, Jiawei (August 2023, ACM)
Proc. 2023 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (Ed.)
Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e.g., category names, category-indicative keywords). Existing studies on weakly supervised paper classification are less concerned with two challenges: (1) Papers should be classified into not only coarse-grained research topics but also fine-grained themes, and potentially into multiple themes, given a large and fine-grained label space; and (2) full text should be utilized to complement the paper title and abstract for classification. Moreover, instead of viewing the entire paper as a long linear sequence, one should exploit the structural information such as citation links across papers and the hierarchy of sections and paragraphs in each paper. To tackle these challenges, in this study, we propose FuTex, a framework that uses the cross-paper network structure and the in-paper hierarchy structure to classify full-text scientific papers under weak supervision. A network-aware contrastive fine-tuning module and a hierarchyaware aggregation module are designed to leverage the two types of structural signals, respectively. Experiments on two benchmark datasets demonstrate that FuTex significantly outperforms competitive baselines and is on par with fully supervised classifiers that use 1,000 to 60,000 ground-truth training samples.
more » « less
Full Text Available
Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data

https://doi.org/10.18653/v1/2023.emnlp-demo.36

Zhong, Ming; Ouyang, Siru; Jiao, Yizhu; Kargupta, Priyanka; Luo, Leo; Shen, Yanzhen; Zhou, Bobby; Zhong, Xianrui; Liu, Xuan; Li, Hongxiang; et al (January 2023, Association for Computational Linguistics)

Full Text Available

Search for: All records