skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: Themis: Automatically testing software for discrimination
Bias in decisions made by modern software is becoming a common and serious problem. We present Themis, an automated test suite generator to measure two types of discrimination, including causal relationships between sensitive inputs and program behavior. We explain how Themis can measure discrimination and aid its debugging, describe a set of optimizations Themis uses to reduce test suite size, and demonstrate Themis' effectiveness on open-source software. Themis is open-source and all our evaluation data are available at http://fairness.cs.umass.edu/. See a video of Themis in action: https://youtu.be/brB8wkaUesY  more » « less
Award ID(s):
1744471 1763423 1453474 1453543
PAR ID:
10079791
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the Demonstrations Track at the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)
Page Range / eLocation ID:
871 to 875
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper defines the notions of software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates. Specifically, the paper focuses on measuring causality in discriminatory behavior. Modern software contributes to important societal decisions and evidence of software discrimination has been found in systems that recommend criminal sentences, grant access to financial loans and products, and determine who is allowed to participate in promotions and receive services. Our approach, Themis, measures discrimination in software by generating efficient, discrimination-testing test suites. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and, notably, does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination. 
    more » « less
  2. Developers of open-source software projects tend to collaborate in bursts of activity over a few days at a time, rather than at an even pace. A project might find its productivity suffering if bursts of activity occur when a key person with the right role or right expertise is not available to participate. Open-source projects could benefit from monitoring the way they orchestrate attention among key developers, finding ways to make themselves available to one another when needed. In commercial software development, Sociotechnical Congruence (STC) has been used as a measure to assess whether coordination among developers is sufficient for a given task. However, STC has not previously been successfully applied to open-source projects, in which some industrial assumptions do not apply: managementchosen targets, mandated steady work hours, and top-down task allocation of inputs and targets. In this work we propose an operationalization of STC for open-source software development. We use temporal bursts of activity as a unit of analysis more suited to the natural rhythms of open-source work, as well as open source analogues of other component measures needed for calculating STC. As an illustration, we demonstrate that opensource development on PyPI projects in GitHub is indeed bursty, that activities in the bursts have topical coherence, and we apply our operationalization of STC. We argue that a measure of socio-technical congruence adapted to open source could provide projects with a better way of tracking how effectively they are collaborating when they come together to collaborate. 
    more » « less
  3. Mutation testing is widely used in research as a metric for evaluating the quality of test suites. Mutation testing runs the test suite on generated mutants (variants of the code under test), where a test suite kills a mutant if any of the tests fail when run on the mutant. Mutation testing implicitly assumes that tests exhibit deterministic behavior, in terms of their coverage and the outcome of a test (not) killing a certain mutant. Such an assumption does not hold in the presence of flaky tests, whose outcomes can non-deterministically differ even when run on the same code under test. Without reliable test outcomes, mutation testing can result in unreliable results, e.g., in our experiments, mutation scores vary by four percentage points on average between repeated executions, and 9% of mutant-test pairs have an unknown status. Many modern software projects suffer from flaky tests. We propose techniques that manage flakiness throughout the mutation testing process, largely based on strategically re-running tests. We implement our techniques by modifying the open-source mutation testing tool, PIT. Our evaluation on 30 projects shows that our techniques reduce the number of "unknown" (flaky) mutants by 79.4%. 
    more » « less
  4. Open-source software projects have become an integral part of our daily life, supporting virtually every software we use today. Since open-source software forms the digital infrastructure, maintaining them is of utmost importance. We present Climate Coach, a dashboard that helps open-source project maintainers monitor the health of their community in terms of team climate and inclusion. Through a literature review and an exploratory survey (N=18), we identified important signals that can reflect a project’s health, and display them on a dashboard. We evaluated and refined our dashboard through two rounds of think-aloud studies (N=19). We then conducted a two-week longitudinal diary study (N=10) to test the usefulness of our dashboard. We found that displaying signals that are related to a project’s inclusion help improve maintainers’ management strategies. 
    more » « less
  5. Abstract

    Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Many researchers focus on estimating the average marginal effects of each factor while averaging over the other factors. Although this allows for straightforward design-based estimation, the results critically depend on the ways in which factors interact with one another. An alternative model-based approach can compute various quantities of interest, but requires correct model specifications, a challenging task for conjoint analysis with many factors. We propose a new hypothesis testing approach based on the conditional randomization test (CRT) to answer the most fundamental question of conjoint analysis: Does a factor of interest matterin any waygiven the other factors? Although it only provides a formal test of these binary questions, the CRT is solely based on the randomization of factors, and hence requires no modeling assumption. This means that the CRT can provide a powerful and assumption-free statistical test by enabling the use of any test statistic, including those based on complex machine learning algorithms. We also show how to test commonly used regularity assumptions. Finally, we apply the proposed methodology to conjoint analysis of immigration preferences. An open-source software package is available for implementing the proposed methodology. The proposed methodology is implemented via an open-source software R packageCRTConjoint, available through the Comprehensive R Archive Networkhttps://cran.r-project.org/web/packages/CRTConjoint/index.html.

     
    more » « less