skip to main content


Title: Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model
Award ID(s):
1846142
NSF-PAR ID:
10476839
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)
Page Range / eLocation ID:
1492 to 1504
Format(s):
Medium: X
Location:
Luxembourg, Luxembourg
Sponsoring Org:
National Science Foundation
More Like this
  1. Nested linear coding is a widely used technique in wireless communication systems for improving both security and reliability. Some parameters, such as the relative generalized Hamming weight and the relative dimension/length profile, can be used to characterize the performance of nested linear codes. In addition, the rank properties of generator and parity-check matrices can also precisely characterize their security performance. Despite this, finding optimal nested linear secrecy codes remains a challenge in the finite-blocklength regime, often requiring brute-force search methods. This paper investigates the properties of nested linear codes, introduces a new representation of the relative generalized Hamming weight, and proposes a novel method for finding the best nested linear secrecy code for the binary erasure wiretap channel by working from the worst nested linear secrecy code in the dual space. We demonstrate that our algorithm significantly outperforms the brute-force technique in terms of speed and efficiency.

     
    more » « less
  2. A code clone refers to code fragments in the source code that are identical or similar to each other. Code clones lead difficulties in software maintenance, bug fixing, present poor design and increase the system size. Code clone detection techniques and tools have been proposed by many researchers, however, there is a lack of clone detection techniques especially for large scale repositories. In this paper, we present a token-based clone detector called Intelligent Clone Detection Tool (ICDT) that can detect both exact and near-miss clones from large repositories using a standard workstation environment. In order to evaluate the scalability and the efficiency of ICDT, we use the most recent benchmark which is a big benchmark of real clones, BigCloneBench. In addition, we compare ICDT to four publicly available and state-of-the-art tools. 
    more » « less
  3. null (Ed.)
    Studies of eye movements during source code reading have supported the idea that reading source code differs fundamentally from reading natural text. The paper analyzed an existing data set of natural language and source code eye movement data using the E-Z reader model of eye movement control. The results show that the E-Z reader model can be used with natural text and with source code where it provides good predictions of eye movement duration. This result is confirmed by comparing model predictions to eye-movement data from this experiment and calculating the correlation score for each metric. Finally, it was found that gaze duration is influenced by token frequency in code and in natural text. The frequency effect is less pronounced on first fixation duration and single fixation duration. An eye movement control model for source code reading may open the door for tools in education and the industry to enhance program comprehension. 
    more » « less
  4. This release covers the state of the data and associated analysis code for determining code sharing between cryptocurrency codebases funded through the end of the original NSF CRII award. This material is based on work supported by the National Science Foundation under Grant CNS-1849729.

     
    more » « less
  5. Over the past few years, there has been an increased interest in using FPGAs alongside CPUs and GPUs in high-performance computing systems and data centers. This trend has led to a push toward the use of high-level programming models and libraries, such as OpenCL, both to lower the barriers to the adoption of FPGAs by programmers unfamiliar with hardware description languages, and to allow to deploy a single code on different devices seamlessly. Today, both Intel and Xilinx offer toolchains to compile OpenCL code onto FPGA. However, using OpenCL on FPGAs is complicated by performance portability issues, since different devices have fundamental differences in architecture and nature of hardware parallelism they offer. Hence, platform-specific optimizations are crucial to achieving good performance across devices. In this paper, we propose a code transformation to improve the performance of OpenCL codes running on FPGA. The proposed method uses pipes to separate the memory accesses and core computation within OpenCL kernels. We analyze the benefits of the approach as well as the restrictions to its applicability. Using OpenCL applications from popular benchmark suites, we show that this code transformation can result in higher utilization of the global memory bandwidth available and increased instruction concurrency, thus improving the overall throughput of OpenCL kernels at the cost of a modest resource utilization overhead. Further concurrency can be achieved by using multiple memory and compute kernels. 
    more » « less