skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Fang, Hongzhou"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Complex software systems consist of multiple overlapping design structures, such as abstractions, features, crosscutting concerns, or patterns. This is similar to how a human body has multiple interacting subsystems, such as respiratory, digestive, or circulatory. Unlike in the medical domain, software designers do not have an effective way to distinguish, visualize, comprehend, and analyze these interleaving design structures. As a result, developers often struggle through the maze of source code. In this paper, we present an Automated Concept Explanation (ACE) framework that automatically extracts and categorizes major concepts from source code based on the roles that files play in design structures and their topic frequencies. Based on these categorized concepts, ACE recovers four categories of high-level design models using different algorithms and generates a natural language explanation for each. To assess if and how ACE can help developers better understand design structures, we conducted an empirical study where two groups of graduate students were assigned three design comprehension tasks: identifying feature-related files, identifying dependencies among features, and identifying design patterns used, in an open-source project. The results reveal that the students who used ACE can accomplish these tasks much faster and more accurately, and they acknowledged the usefulness of the categorized concepts and structures, multi-type high-level model visualization, and natural language explanations. 
    more » « less
    Free, publicly-accessible full text available February 1, 2026
  2. Background: Software practitioners need reliable metrics to monitor software evolution, compare projects, and understand modularity variations. This is crucial for assessing architectural improvement or decay. Existing popular metrics offer little help, especially in systems with implicitly connected but seemingly isolated files. Aim: Our objective is to explore why and how state-of-the-art modularity measures fail to serve as effective metrics and to devise a new metric that more accurately captures complexity changes and is less distorted by sizes or isolated files. Methods: We analyzed metric scores for 1,220 releases across 37 projects to identify the root causes of their shortcomings. This led to the creation of M-score, a new software modularity metric that combines the strengths of existing metrics while addressing their flaws. M-score rewards small, independent modules, penalizes increased coupling, and treats isolated modules and files consistently. Results: Our evaluation revealed that M-score outperformed other modularity metrics in terms of stability, particularly with respect to isolated files, because it captures coupling density and module independence. It also correlated well with maintenance effort, as indicated by historical maintainability measures, meaning that the higher the M-score, the more likely maintenance tasks can be accomplished independently and in parallel. Conclusions: Our research identifies the shortcomings of current metrics in accurately depicting software complexity and proposes M-score, a new metric with superior stability and better reflection of complexity and maintenance effort, making it a promising metric for software architectural assessments, comparison, and monitoring. 
    more » « less
  3. In this paper, we introduce CIDER, a Concept-based Interactive DEsign Recovery tool that recovers a software design in the form of hierarchically organized concepts. In addition to facilitating design comprehension, it also enables designers to assess design quality and identify design problems. It integrates multiple clustering algorithms to reduce the complexity of the recovered design structure, leverages information retrieval techniques to name each cluster using the most relevant topic terms to ease design comprehension, and identifies and labels highly-coupled file clusters to reveal possible design problems. It enables interactive selection of concepts of interest and recovers partial design structures accordingly. The user can also interactively change the levels of recovered hierarchical structure to visualize the design at different granularities. 
    more » « less
  4. null (Ed.)
  5. Architecture degradation has a strong negative impact on software quality and can result in significant losses. Severe software degradation does not happen overnight. Software evolves continuously, through numerous issues, fixing bugs and adding new features, and architecture flaws emerge quietly and largely unnoticed until they grow in scope and significance when the system becomes difficult to maintain. Developers are largely unaware of these flaws or the accumulating debt as they are focused on their immediate tasks of address individual issues. As a consequence, the cumulative impacts of their activities, as they affect the architecture, go unnoticed. To detect these problems early and prevent them from accumulating into severe ones we propose to monitor software evolution by tracking the interactions among files revised to address issues. In particular, we propose and show how we can automatically detect active hotspots, to reveal architecture problems. We have studied hundreds of hotspots along the evolution timelines of 21 open source projects and showed that there exist just a few dominating active hotspots per project at any given time. Moreover, these dominating active hotspots persist over long time periods, and thus deserve special attention. Compared with state-of-the-art design and code smell detection tools we report that, using active hotspots, it is possible to detect signs of software degradation both earlier and more precisely. 
    more » « less