In software merge, the edits from different branches can textually overlap (i.e., textual conflicts) or cause build and test errors (i.e., build and test conflicts), jeopardizing programmer productivity and software quality. Existing tools primarily focus on textual conflicts; few tools detect higher-order conflicts (i.e., build and test conflicts). However, existing detectors of build conflicts are limited. Due to their heavy usage of automatic build, current detectors (e.g., Crystal) only report build errors instead of identifying the root causes; developers have to manually locate conflicting edits. These detectors only help when the branches-to-merge have no textual conflict. We present a new static analysis-based approach Bucond (“build conflict detector”). Given three code versions in a merging scenario: base b, left l, and right r, Bucond models each version as a graph, and compares graphs to extract entity-related edits (e.g., class renaming) in l and r. We believe that build conflicts occur when certain edits are co-applied to related entities between branches. Bucond realizes this insight via pattern matching to identify any cross-branch edit combination that can trigger build conflicts (e.g., one branch adds a reference to field F while the other branch removes F). We systematically explored and devised 57 patterns, covering 97% of the build conflicts in our experiments. Our evaluation shows Bucond to complement build-based detectors, as it (1) detects conflicts with 100% precision and 88%–100% recall, (2) locates conflicting edits, and (3) works well when those detectors do not. 
                        more » 
                        « less   
                    
                            
                            A Characterization Study of Merge Conflicts in Java Projects
                        
                    
    
            In collaborative software development, programmers create software branches to add features and fix bugs tentatively, and then merge branches to integrate edits. When edits from different branches textually overlap (i.e.,textual conflicts) or lead to compilation and runtime errors (i.e.,build and test conflicts), it is challenging for developers to remove such conflicts. Prior work proposed tools to detect and solve conflicts. They investigate how conflicts relate to code smells and the software development process. However, many questions are still not fully investigated, such as what types of conflicts exist in real-world applications and how developers or tools handle them. For this article, we used automated textual merge, compilation, and testing to reveal three types of conflicts in 208 open-source repositories: textual conflicts, build conflicts (i.e., conflicts causing build errors), and test conflicts (i.e., conflicts triggering test failures). We manually inspected 538 conflicts and their resolutions to characterize merge conflicts from different angles. Our analysis revealed three interesting phenomena. First, higher-order conflicts (i.e., build and test conflicts) are harder to detect and resolve, while existing tools mainly focus on textual conflicts. Second, developers manually resolved most higher-order conflicts by applying similar edits to multiple program locations; their conflict resolutions share common editing patterns implying great opportunities for future tool design. Third, developers resolved 64% of true textual conflicts by keeping complete edits from either a left or right branch. Unlike prior studies, our research for the first time thoroughly characterizes three types of conflicts, with a special focus on higher-order conflicts and limitations of existing tool design. Our work will shed light on future research of software merge. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1845446
- PAR ID:
- 10515782
- Publisher / Repository:
- Association for Computing Machinery
- Date Published:
- Journal Name:
- ACM Transactions on Software Engineering and Methodology
- Volume:
- 32
- Issue:
- 2
- ISSN:
- 1049-331X
- Page Range / eLocation ID:
- 1 to 28
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Version control systems typically rely on apatch language, heuristicpatch synthesis algorithmslike diff, andthree-way merge algorithms. Standard patch languages and merge algorithms often fail to identify conflicts correctly when there are multiple edits to one line of code or code is relocated. This paper introduces Grove, a collaborative structure editor calculus that eliminates patch synthesis and three-way merge algorithms entirely. Instead, patches are derived directly from the log of the developer’s edit actions and all edits commute, i.e. the repository state forms a commutative replicated data type (CmRDT). To handle conflicts that can arise due to code relocation, the core datatype in Grove is a labeled directed multi-graph with uniquely identified vertices and edges. All edits amount to edge insertion and deletion, with deletion being permanent. To support tree-based editing, we define a decomposition from graphs intogroves, which are a set of syntax trees with conflicts—including local, relocation, and unicyclic relocation conflicts—represented explicitly using holes and references between trees. Finally, we define a type error localization system for groves that enjoys atotalityproperty, i.e. all editor states in Grove are statically meaningful, so developers can use standard editor services while working to resolve these explicitly represented conflicts. The static semantics is defined as a bidirectional marking system in line with recent work, with gradual typing employed to handle situations where errors and conflicts prevent type determination. We then layer on a unification-based type inference system to opportunistically fill type holes and fail gracefully when no solution exists. We mechanize the metatheory of Grove using the Agda theorem prover. We implement these ideas as theGrove Workbench, which generates the necessary data structures and algorithms in OCaml given a syntax tree specification.more » « less
- 
            Dealing with merge conflicts in version control systems is a challenging task for software developers. Resolving merge conflicts is a time-consuming and error-prone process, which distracts developers from important tasks. Recent work shows that refactorings are often involved in merge conflicts and that refactoring-related conflicts tend to be larger, making them harder to resolve. In the literature, there are two refactoring-aware merging techniques that claim to automatically resolve refactoring-related conflicts; however, these two techniques have never been empirically compared. In this paper, we present RefMerge, a rejuvenated Java-based design and implementation of the first technique, which is an operation-based refactoring-aware merging algorithm. We compare RefMerge to Git and the state-of-the-art graph-based refactoring-aware merging tool, IntelliMerge, on 2,001 merge scenarios with refactoring-related conflicts from 20 open-source projects. We find that RefMerge resolves or reduces conflicts in 497 (25%) merge scenarios while increasing conflicting LOC in only 214 (11%) scenarios. On the other hand, we find that IntelliMerge resolves or reduces conflicts in 478 (24%) merge scenarios but increases conflicting LOC in 597 (30%) merge scenarios. We additionally conduct a qualitative analysis of the differences between the three merging algorithms and provide insights of the strengths and weaknesses of each tool. We find that while IntelliMerge does well with ordering and formatting conflicts, it struggles with class-level refactorings and scenarios with several refactorings. On the other hand, RefMerge is resilient to the number of refactorings in a merge scenario, but we find that RefMerge introduces conflicts when inverting move-related refactorings.more » « less
- 
            Build systems are responsible for building software correctly and quickly. Unfortunately, traditional build tools like make are correct and fast only when developers precisely enumerate dependencies for every incremental build step. Forward build systems improve correctness over traditional build tools by discovering dependencies automatically, but existing forward build tools have two fundamental flaws. First, they are incorrect; existing forward build tools miss dependencies because their models of system state are incomplete. Second, they rely on users to manually specify incremental build steps, increasing the programmer burden for fast builds. This paper introduces Riker, a forward build system that guarantees fast, correct builds. Riker builds are easy to specify; in many cases a single command such as gcc *.c suffices. From these simple specifications, Riker automatically discovers fast incremental rebuild opportunities. Riker models the entire POSIX filesystem—not just files, but directories, pipes, and so on. This model guarantees that every dependency is checked on every build so every output is correct. We use Riker to build 14 open source packages including LLVM and memcached. Riker incurs a median overhead of 8.8% on the initial full build. On average, Riker’s incremental builds realize 94% of make’s incremental speedup with no manual effort and no risk of errors.more » « less
- 
            Build systems are responsible for building software correctly and quickly. Unfortunately, traditional build tools like make are correct and fast only when developers precisely enumerate dependencies for every incremental build step. Forward build systems improve correctness over traditional build tools by discovering dependencies automatically, but existing forward build tools have two fundamental flaws. First, they are incorrect; existing forward build tools miss dependencies because their models of system state are incomplete. Second, they rely on users to manually specify incremental build steps, increasing the programmer burden for fast builds. This paper introduces Riker, a forward build system that guarantees fast, correct builds. Riker builds are easy to specify; in many cases a single command such as gcc *.c suffices. From these simple specifications, Riker automatically discovers fast incremental rebuild opportunities. Riker models the entire POSIX filesystem—not just files, but directories, pipes, and so on. This model guarantees that every dependency is checked on every build so every output is correct. We use Riker to build 14 open source packages including LLVM and memcached. Riker incurs a median overhead of 8.8% on the initial full build. On average, Riker's incremental builds realize 94% of make's incremental speedup with no manual effort and no risk of errors.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    