skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Title: Understanding Evolutionary Coupling by Fine-Grained Co-Change Relationship Analysis
Frequent co-changes to multiple files, i.e., evolutionary coupling, can demonstrate active relations among files, explicit or implicit. Although evolutionary coupling has been used to analyze software quality, there is no systematic study on the categorization of frequent co-changes between files which may used for characterizing various quality problems. In this paper, we report an empirical study on 27,087 co-change commits of 6 open-source systems with the purpose of understanding the observed evolutionary coupling. We extracted fine-grained change information from version control system to investigate whether two files exhibit particular kinds of co-change relationships. We consider code changes on 5 types of program entities (i.e., field, method, control statement, non-control statement, and class) and identified 6 types of dominating co-change relationships. Our manual analysis showed that each of the 6 types can be explained by structural coupling, semantic coupling, or implicit dependencies. Temporal analysis further shows that files may exhibit different co-change relationships at different phases in the evolution history. Finally, we investigated co-changes among multiple files by combining co-change relationships between related file pairs and showed with live examples that rich information embedded in the fine-grained co-change relationships may help developers to change code at multiple locations. Moreover, we analyzed how these co-change relationship types can be used to facilitate change impact analysis and to pinpoint design problems.  more » « less
Award ID(s):
1835292
PAR ID:
10199009
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Impact analysis (IA) is a critical software maintenance task that identifies the effects of a given set of code changes on a larger software project with the intention of avoiding potential adverse effects. IA is a cognitively challenging task that involves reasoning about the abstract relationships between various code constructs. Given its difficulty, researchers have worked to automate IA with approaches that primarily use coupling metrics as a measure of the connectedness of different parts of a software project. Many of these coupling metrics rely on static, dynamic, or evolutionary information and are based on heuristics that tend to be brittle, require expensive execution analysis, or large histories of co-changes to accurately estimate impact sets. In this paper, we introduce a novel IA approach, called ATHENA, that combines a software system's dependence graph information with a conceptual coupling approach that uses advances in deep representation learning for code without the need for change histories and execution information. Previous IA benchmarks are small, containing less than ten software projects, and suffer from tangled commits, making it difficult to measure accurate results. Therefore, we constructed a large-scale IA benchmark, from 25 open-source software projects, that utilizes fine-grained commit information from bug fixes. On this new benchmark, our best performing approach configuration achieves an mRR, mAP, and HIT@10 score of 60.32%, 35.19%, and 81.48%, respectively. Through various ablations and qualitative analyses, we show that ATHENA's novel combination of program dependence graphs and conceptual coupling information leads it to outperform a simpler baseline by 10.34%, 9.55%, and 11.68% with statistical significance. 
    more » « less
  2. Crowdsourcing is popular for large-scale data collection and labeling, but a major challenge is on detecting low-quality submissions. Recent studies have demonstrated that behavioral features of workers are highly correlated with data quality and can be useful in quality control. However, these studies primarily leveraged coarsely extracted behavioral features, and did not further explore quality control at the fine-grained level, i.e., the annotation unit level. In this paper, we investigate the feasibility and benefits of using fine-grained behavioral features, which are the behavioral features finely extracted from a worker's individual interactions with each single unit in a subtask, for quality control in crowdsourcing. We design and implement a framework named Fine-grained Behavior-based Quality Control (FBQC) that specifically extracts fine-grained behavioral features to provide three quality control mechanisms: (1) quality prediction for objective tasks, (2) suspicious behavior detection for subjective tasks, and (3) unsupervised worker categorization. Using the FBQC framework, we conduct two real-world crowdsourcing experiments and demonstrate that using fine-grained behavioral features is feasible and beneficial in all three quality control mechanisms. Our work provides clues and implications for helping job requesters or crowdsourcing platforms to further achieve better quality control. 
    more » « less
  3. Context: Supervised learning-based projects (SLPs), i.e., software projects that use supervised learning algorithms, such as decision trees are useful for performing classification-related tasks. Yet, security weaknesses, such as the use of hard-coded passwords in SLPs, can make SLPs susceptible to security attacks. A characterization of security weaknesses in SLPs can help practitioners understand the security weaknesses that are frequent in SLPs and adopt adequate mitigation strategies. Objective: The goal of this paper is to help practitioners securely develop supervised learning-based projects by conducting an empirical study of security weaknesses in supervised learning-based projects. Methodology: We conduct an empirical study by quantifying the frequency of security weaknesses in 278 open source SLPs. Results: We identify 22 types of security weaknesses that occur in SLPs. We observe ‘use of potentially dangerous function’ to be the most frequently occurring security weakness in SLPs. Of the identified 3,964 security weaknesses, 23.79% and 40.49% respectively, appear for source code files used to train and test models. We also observe evidence of co-location, e.g., instances of command injection co-locates with instances of potentially dangerous function. Conclusion: Based on our findings, we advocate for a shift left approach for SLP development with security-focused code reviews, and application of security static analysis. 
    more » « less
  4. With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence. However, the most popular datasets used to evaluate MLLMs are some of the earliest ones created, and they have many known problems, including extreme bias, spurious correlations, and an inability to permit fine-grained analysis. In this paper, we pioneer evaluating recent MLLMs (LLaVA 1.5, LLaVA-NeXT, BLIP2, InstructBLIP, GPT-4V, and GPT-4o) on datasets designed to address weaknesses in earlier ones. We assess three VQA datasets: 1) TDIUC, which permits fine-grained analysis on 12 question types; 2) TallyQA, which has simple and complex counting questions; and 3) DVQA, which requires optical character recognition for chart understanding. We also study VQDv1, a dataset that requires identifying all image regions that satisfy a given query. Our experiments reveal the weaknesses of many MLLMs that have not previously been reported. Our code is integrated into the widely used LAVIS framework for MLLM evaluation, enabling the rapid assessment of future MLLMs. 
    more » « less
  5. Software developed in different platforms has different characteristics and needs. More specifically, code changes are differently performed in the mobile platform compared to non-mobile platforms (e.g., desktop and Web platforms). Prior works have investigated the differences in specific platforms. However, we still lack a deeper understanding of how code changes evolve across different software platforms. In this paper, we present a study aiming at investigating the frequency of changes and how source code changes, build changes and test changes co-evolve in mobile and non-mobile platforms. We developed linear regression models to explain which factors influence the frequency of changes in different platforms and applied the Apriori algorithm to find types of changes that frequently occur together. Our findings show that non-mobile repositories have a higher number of commits per month compared to mobile and our regression models suggest that being mobile significantly impacts on the number of commits in a negative direction when controlling for confound factors, such as code size. We also found that developers do not usually change source code files together with build files or test files. We argue that our results can provide valuable information for developers on how changes are performed in different platforms so that practices adopted in successful software systems can be followed. 
    more » « less