Measurement of Embedding Choices on Cryptographic API Completion Tasks

Xiao, Ya; Song, Wenjia; Ahmed, Salman; Ge, Xinyang; Viswanath, Bimal; Meng, Na; Yao, Danfeng Daphne

doi:10.1145/3625291

Citation Details

Measurement of Embedding Choices on Cryptographic API Completion Tasks

In this article, we conduct a measurement study to comprehensively compare the accuracy impacts of multiple embedding options in cryptographic API completion tasks. Embedding is the process of automatically learning vector representations of program elements. Our measurement focuses on design choices of three important aspects,program analysis preprocessing,token-level embedding, andsequence-level embedding. Our findings show that program analysis is necessary even under advanced embedding. The results show 36.20% accuracy improvement, on average, when program analysis preprocessing is applied to transfer bytecode sequences into API dependence paths. With program analysis and the token-level embedding training, the embeddingdep2vecimproves the task accuracy from 55.80% to 92.04%. Moreover, only a slight accuracy advantage (0.55%, on average) is observed by training the expensive sequence-level embedding compared with the token-level embedding. Our experiments also suggest the differences made by the data. In the cross-app learning setup and a data scarcity scenario, sequence-level embedding is more necessary and results in a more obvious accuracy improvement (5.10%). more »

Award ID(s):: 1929701

PAR ID:: 10547778

Author(s) / Creator(s):: Xiao, Ya; Song, Wenjia; Ahmed, Salman; Ge, Xinyang; Viswanath, Bimal; Meng, Na; Yao, Danfeng Daphne

Publisher / Repository:: ACM

Date Published:: 2024-03-31

Journal Name:: ACM Transactions on Software Engineering and Methodology

Volume:: 33

Issue:: 3

ISSN:: 1049-331X

Page Range / eLocation ID:: 1 to 30

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1145/3625291

More Like this