Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
In this paper, we introduce CIDER, a Concept-based Interactive DEsign Recovery tool that recovers a software design in the form of hierarchically organized concepts. In addition to facilitating design comprehension, it also enables designers to assess design quality and identify design problems. It integrates multiple clustering algorithms to reduce the complexity of the recovered design structure, leverages information retrieval techniques to name each cluster using the most relevant topic terms to ease design comprehension, and identifies and labels highly-coupled file clusters to reveal possible design problems. It enables interactive selection of concepts of interest and recovers partial design structures accordingly. The user can also interactively change the levels of recovered hierarchical structure to visualize the design at different granularities.more » « less
-
Recently we have worked with a dozen industrial collaborators to pinpoint and quantify architecture debts, from multi-national corporations to startup companies. Our technology leverages a wide range of project data, from source file dependencies to issue records, and we interacted with projects of various sizes and characteristics. Crossing the border between research and practice, we have observed significant gaps in terms of data availability and quality among projects of different kinds. Compared with successful open source projects, data from proprietary projects are rarely complete or well-organized. Consequently, not all projects can benefit from all the features and analyses we provide. This, in turn, made them realize they needed to improve their development processes. In this talk, we categorize the commonly observed differences between open source and proprietary project data, analyze the reasons for such differences, and propose suggestions to minimize the gaps, to facilitate advances to both software research and practice.more » « less
-
Architecture degradation has a strong negative impact on software quality and can result in significant losses. Severe software degradation does not happen overnight. Software evolves continuously, through numerous issues, fixing bugs and adding new features, and architecture flaws emerge quietly and largely unnoticed until they grow in scope and significance when the system becomes difficult to maintain. Developers are largely unaware of these flaws or the accumulating debt as they are focused on their immediate tasks of address individual issues. As a consequence, the cumulative impacts of their activities, as they affect the architecture, go unnoticed. To detect these problems early and prevent them from accumulating into severe ones we propose to monitor software evolution by tracking the interactions among files revised to address issues. In particular, we propose and show how we can automatically detect active hotspots, to reveal architecture problems. We have studied hundreds of hotspots along the evolution timelines of 21 open source projects and showed that there exist just a few dominating active hotspots per project at any given time. Moreover, these dominating active hotspots persist over long time periods, and thus deserve special attention. Compared with state-of-the-art design and code smell detection tools we report that, using active hotspots, it is possible to detect signs of software degradation both earlier and more precisely.more » « less
-
Clone detection techniques have been explored for decades. Recently, deep learning techniques has been adopted to improve the code representation capability, and improve the state-of-the-art in code clone detection. These approaches usually require a transformation from AST to binary tree to incorporate syntactical information, which introduces overheads. Moreover, these approaches conduct term-embedding, which requires large training datasets. In this paper, we introduce a tree embedding technique to conduct clone detection. Our approach first conducts tree embedding to obtain a node vector for each intermediate node in the AST, which captures the structure information of ASTs. Then we compose a tree vector from its involving node vectors using a lightweight method. Lastly Euclidean distances between tree vectors are measured to determine code clones. We implement our approach in a tool called TECCD and conduct an evaluation using the BigCloneBench (BCB) and 7 other large scale Java projects. The results show that our approach achieves good accuracy and recall and outperforms existing approaches.more » « less
-
This paper present our tool suite called DV8. The objective of DV8 is to measure software modularity, detect architecture anti-patterns as technical debts, quantify the maintenance cost of each instance of an anti-pattern, and enable return on investment analyses of architectural debts. Different from other tools, DV8 integrates data from both source code and revision history. We now elaborate on each of DV8's capabilities.more » « less
An official website of the United States government

Full Text Available