skip to main content


This content will become publicly available on February 22, 2025

Title: Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization
Award ID(s):
2211428
NSF-PAR ID:
10495855
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
32nd IEEE/ACM International Conference on Program Comprehension, RENE
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Nested linear coding is a widely used technique in wireless communication systems for improving both security and reliability. Some parameters, such as the relative generalized Hamming weight and the relative dimension/length profile, can be used to characterize the performance of nested linear codes. In addition, the rank properties of generator and parity-check matrices can also precisely characterize their security performance. Despite this, finding optimal nested linear secrecy codes remains a challenge in the finite-blocklength regime, often requiring brute-force search methods. This paper investigates the properties of nested linear codes, introduces a new representation of the relative generalized Hamming weight, and proposes a novel method for finding the best nested linear secrecy code for the binary erasure wiretap channel by working from the worst nested linear secrecy code in the dual space. We demonstrate that our algorithm significantly outperforms the brute-force technique in terms of speed and efficiency.

     
    more » « less
  2. A code clone refers to code fragments in the source code that are identical or similar to each other. Code clones lead difficulties in software maintenance, bug fixing, present poor design and increase the system size. Code clone detection techniques and tools have been proposed by many researchers, however, there is a lack of clone detection techniques especially for large scale repositories. In this paper, we present a token-based clone detector called Intelligent Clone Detection Tool (ICDT) that can detect both exact and near-miss clones from large repositories using a standard workstation environment. In order to evaluate the scalability and the efficiency of ICDT, we use the most recent benchmark which is a big benchmark of real clones, BigCloneBench. In addition, we compare ICDT to four publicly available and state-of-the-art tools. 
    more » « less
  3. null (Ed.)
    Studies of eye movements during source code reading have supported the idea that reading source code differs fundamentally from reading natural text. The paper analyzed an existing data set of natural language and source code eye movement data using the E-Z reader model of eye movement control. The results show that the E-Z reader model can be used with natural text and with source code where it provides good predictions of eye movement duration. This result is confirmed by comparing model predictions to eye-movement data from this experiment and calculating the correlation score for each metric. Finally, it was found that gaze duration is influenced by token frequency in code and in natural text. The frequency effect is less pronounced on first fixation duration and single fixation duration. An eye movement control model for source code reading may open the door for tools in education and the industry to enhance program comprehension. 
    more » « less
  4. This release covers the state of the data and associated analysis code for determining code sharing between cryptocurrency codebases funded through the end of the original NSF CRII award. This material is based on work supported by the National Science Foundation under Grant CNS-1849729.

     
    more » « less