skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Snapshot Metrics Are Not Enough: Analyzing Software Repositories with Longitudinal Metrics
Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time—longitudinal metrics that give insight about process, not just product. In thiswork,we present PRIME (PRocess MEtrics), a tool to compute and visualize process metrics. The currently-supported metrics include productivity, issue density, issue spoilage, and bus factor.We illustrate the value of longitudinal data and conclude with a research agenda. The tool’s demo video can be watched at https://bit.ly/ase2022-prime. Source code can be found at https://github.com/SoftwareSystemsLaboratory/prime.  more » « less
Award ID(s):
2107230 2107020
PAR ID:
10427470
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
Page Range / eLocation ID:
1 to 4
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Obeid, Iyad; Selesnick, Ivan; Picone, Joseph (Ed.)
    There has been a lack of standardization of the evaluation of sequential decoding systems in the bioengineering community. Assessment of the accuracy of a candidate system’s segmentations and measurement of a false alarm rate are examples of two performance metrics that are very critical to the operational acceptance of a technology. However, measurement of such quantities in a consistent manner require many scoring software implementation details to be resolved. Results can be highly sensitive to these implementation details. In this paper, we revisit and evaluate a set of metrics introduced in our open source scoring software for sequential decoding of multichannel signals. This software was used to rank sixteen automatic seizure detection systems recently developed for the 2020 Neureka® Epilepsy Challenge. The systems produced by the participants provided us with a broad range of design variations that allowed assessment of the consistency of the proposed metrics. We present a comprehensive assessment of four of these new metrics and validate our findings with our previous studies. We also validate a proposed new metric, time-aligned event scoring, that focuses on the segmentation behavior of an algorithm. We demonstrate how we can gain insight into the performance of a system using these metrics. 
    more » « less
  2. Drawing from a longitudinal case study, we inspect the activities of an expanding team of scientists and their collaborators as they sought to develop a novel software pipeline that worked both for themselves and for their wider community. We argue that these two tasks - making the software work for themselves and also for their wider scientific community - could not be differentiated from each other at the beginning of the software development process. Rather, this division of labor and software capacities emerged, articulated by the actors themselves as they went about their tasks. The activities of making the novel software work at all, and the extra work of making that software repurposable or reusable could not be distinguished until near the end of the development process - rather than defined or structured in advance. We discuss implications for the trajectory of software development, and the practical work of making software repurposable. 
    more » « less
  3. The class-imbalance issue is intrinsic to many real-world machine learning tasks, particularly to the rare-event classification problems. Although the impact and treatment of imbalanced data is widely known, the magnitude of a metric’s sensitivity to class imbalance has attracted little attention. As a result, often the sensitive metrics are dismissed while their sensitivity may only be marginal. In this paper, we introduce an intuitive evaluation framework that quantifies metrics’ sensitivity to the class imbalance. Moreover, we reveal an interesting fact that there is a logarithmic behavior in metrics’ sensitivity meaning that the higher imbalance ratios are associated with the lower sensitivity of metrics. Our framework builds an intuitive understanding of the class-imbalance impact on metrics. We believe this can help avoid many common mistakes, specially the less-emphasized and incorrect assumption that all metrics’ quantities are comparable under different class-imbalance ratios. 
    more » « less
  4. These files are supplementary data for this publication: Uhl JH & Leyk S (2022). "Assessing the relationship between morphology and mapping accuracy of built-up areas derived from global human settlement data (https://doi.org/10.1080/15481603.2022.2131192). Each geopackage (GPKG) file contains a set of point locations (in EPSG:3857) attributed with focal accuracy metrics of the GHS-BUILT-R2018A epochs 1975 and 2014, calculated within different levels of spatial support (i.e., focal window size) and for different analytical units (i.e., 30m grid cells, and 3x3 grid cell blocks). Moreover, each location is attributed with focal landscape metrics of built-up areas calculated in the same focal windows using the software Fragstats. These landscape metrics are calculated based on both, GHS built-up areas and reference built-up areas. Reference built-up areas were derived from the Multi-temporal building footprint database for 33 U.S. counties (MTBF-33). These datasets can be used for spatially explicit predictive modeling of the GHS-BUILT R2018A data accuracy using landscape metrics as predictor variables. File nomenclature: lsm_ref_accuracy_sample_2014_1000.gpkg : landscape metrics calculated from the reference built-up areas, for the epoch 2014, using a quadratic focal window of 1,000m x 1,000m. lsm_ghs_accuracy_sample_1975_10000.gpkg : landscape metrics calculated from the ghs built-up areas, for the epoch 1975, using a quadratic focal window of 10,000m x 10,000m. Data processing: Johannes H. Uhl, University of Colorado Boulder (USA), 2020-2022. 
    more » « less
  5. Interaction is the cornerstone of how people perform tasks and gain insight in visual analytics. However, people’s inherent cognitive biases impact their behavior and decision making during their interactive visual analytic process. Understanding how bias impacts the visual analytic process, how it can be measured, and how its negative effects can be mitigated is a complex problem space. Nonetheless, recent work has begun to approach this problem by proposing theoretical computational metrics that are applied to user interaction sequences to measure bias in real-time. In this paper, we implement and apply these computational metrics in the context of anchoring bias. We present the results of a formative study examining how the metrics can capture anchoring bias in real-time during a visual analytic task. We present lessons learned in the form of considerations for applying the metrics in a visual analytic tool. Our findings suggest that these computational metrics are a promising approach for characterizing bias in users’ interactive behaviors. 
    more » « less