NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LLMs and Copyright Risks: Benchmarks and Mitigation Approaches

https://doi.org/10.18653/v1/2025.naacl-tutorial.7

Zhang, Denghui; Xu, Zhaozhuo; Zhao, Weijie (April 2025, Association for Computational Linguistics)

Free, publicly-accessible full text available April 29, 2026
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Parity LLM Data Valuation

Pan, Yanzhou; Lin, Huawei; Ran, Yide; Chen, Jiamin; Yu, Xiaodong; Zhao, Weijie; Zhang, Denghui; Xu, Zhaozhuo (April 2025, Association for Computational Linguistics)
Chiruzzo, Luis; Ritter, Alan; Wang, Lu (Ed.)
Large Language Models (LLMs) heavily rely on high-quality training data, making data valuation crucial for optimizing model performance, especially when working within a limited budget. In this work, we aim to offer a third-party data valuation approach that benefits both data providers and model developers. We introduce a linearized future influence kernel (LinFiK), which assesses the value of individual data samples in improving LLM performance during training. We further propose ALinFiK, a learning strategy to approximate LinFiK, enabling scalable data valuation. Our comprehensive evaluations demonstrate that this approach surpasses existing baselines in effectiveness and efficiency, demonstrating significant scalability advantages as LLM parameters increase.
more » « less
Free, publicly-accessible full text available April 29, 2026
Multi-Faceted Knowledge-Driven Pre-Training for Product Representation Learning

https://doi.org/10.1109/TKDE.2022.3200921

Zhang, Denghui; Liu, Yanchi; Yuan, Zixuan; Fu, Yanjie; Chen, Haifeng; Xiong, Hui (January 2022, IEEE Transactions on Knowledge and Data Engineering)

Full Text Available
Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment

https://doi.org/10.1145/3447548.3467215

Zhang, Denghui; Yuan, Zixuan; Liu, Yanchi; Liu, Hao; Zhuang, Fuzhen; Xiong, Hui; Chen, Haifeng (August 2021, Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining)
null (Ed.)
Full Text Available
Spatio-Temporal Dual Graph Attention Network for Query-POI Matching

https://doi.org/10.1145/3397271.3401159

Yuan, Zixuan; Liu, Hao; Zhang, Denghui; Yi, Fei; Zhu, Nengjun; Xiong, hui (June 2020, International ACM SiGIR Conference on Research and Development in Information Retrieval)

Full Text Available

Search for: All records