skip to main content


Title: SPT-code: sequence-to-sequence pre-training for learning source code representations
Award ID(s):
2034508
NSF-PAR ID:
10343376
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)
Page Range / eLocation ID:
2006 to 2018
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Android mobile applications collect information in various ways to provide users with functionalities and services. An Android app's permission manifest and privacy policy are documents that provide users with guidelines about what information type is being collected. However, the information types mentioned in these files are often abstract and does not include the fine grained information types being collected through user input fields in applications. Existing approaches focus on API calls in the application code and are able to reveal what information types are being collected. However, they are unable to identify the information types based on direct user input as a major source of private information. In this paper, we propose to direct apply natural language processing approach to Android layout code to identify information types associated with input fields in applications. 
    more » « less
  2. Nested linear coding is a widely used technique in wireless communication systems for improving both security and reliability. Some parameters, such as the relative generalized Hamming weight and the relative dimension/length profile, can be used to characterize the performance of nested linear codes. In addition, the rank properties of generator and parity-check matrices can also precisely characterize their security performance. Despite this, finding optimal nested linear secrecy codes remains a challenge in the finite-blocklength regime, often requiring brute-force search methods. This paper investigates the properties of nested linear codes, introduces a new representation of the relative generalized Hamming weight, and proposes a novel method for finding the best nested linear secrecy code for the binary erasure wiretap channel by working from the worst nested linear secrecy code in the dual space. We demonstrate that our algorithm significantly outperforms the brute-force technique in terms of speed and efficiency.

     
    more » « less
  3. A code clone refers to code fragments in the source code that are identical or similar to each other. Code clones lead difficulties in software maintenance, bug fixing, present poor design and increase the system size. Code clone detection techniques and tools have been proposed by many researchers, however, there is a lack of clone detection techniques especially for large scale repositories. In this paper, we present a token-based clone detector called Intelligent Clone Detection Tool (ICDT) that can detect both exact and near-miss clones from large repositories using a standard workstation environment. In order to evaluate the scalability and the efficiency of ICDT, we use the most recent benchmark which is a big benchmark of real clones, BigCloneBench. In addition, we compare ICDT to four publicly available and state-of-the-art tools. 
    more » « less