CODEIPPROMPT: intellectual property infringement assessment of code language models

Yu, Zhiyuan; Wu, Yuhao; Zhang, Ning; Wang, Chenguang; Vorobeychik, Yevgeniy; Xiao, Chaowei

Citation Details

Recent advances in large language models (LMs) have facilitated their ability to synthesize programming code. However, they have also raised concerns about intellectual property (IP) rights violations. Despite the significance of this issue, it has been relatively less explored. In this paper, we aim to bridge the gap by presenting CODEIPPROMPT, a platform for automatic evaluation of the extent to which code language models may reproduce licensed programs. It comprises two key components: prompts constructed from a licensed code database to elicit LMs to generate IP-violating code, and a measurement tool to evaluate the extent of IP violation of code LMs. We conducted an extensive evaluation of existing open-source code LMs and commercial products, and revealed the prevalence of IP violations in all these models. We further identified that the root cause is the substantial proportion of training corpus subject to restrictive licenses, resulting from both intentional inclusion and inconsistent license practice in the real world. To address this issue, we also explored potential mitigation strategies, including fine-tuning and dynamic token filtering. Our study provides a testbed for evaluating the IP violation issues of the existing code generation platforms and stresses the need for a better mitigation strategy. more »

Award ID(s):: 2238635 1916926

PAR ID:: 10504238

Author(s) / Creator(s):: Yu, Zhiyuan; Wu, Yuhao; Zhang, Ning; Wang, Chenguang; Vorobeychik, Yevgeniy; Xiao, Chaowei

Publisher / Repository:: JMLR.org

Date Published:: 2023-07-23

Journal Name:: Proceedings of the 40th International Conference on Machine Learning

Format(s):: Medium: X

Location:: Honolulu, Hawaii, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this