skip to main content


Title: Privacy, Ethics, and Data Access: A Case Study of the Fragile Families Challenge
Stewards of social data face a fundamental tension. On one hand, they want to make their data accessible to as many researchers as possible to facilitate new discoveries. At the same time, they want to restrict access to their data as much as possible to protect the people represented in the data. In this article, we provide a case study addressing this common tension in an uncommon setting: the Fragile Families Challenge, a scientific mass collaboration designed to yield insights that could improve the lives of disadvantaged children in the United States. We describe our process of threat modeling, threat mitigation, and third-party guidance. We also describe the ethical principles that formed the basis of our process. We are open about out process and the trade-offs we made in the hope that others can improve on what we have done.  more » « less
Award ID(s):
1704444 1760052
NSF-PAR ID:
10208547
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Socius: Sociological Research for a Dynamic World
Volume:
5
ISSN:
2378-0231
Page Range / eLocation ID:
237802311881302
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Aleti A., Panichella A (Ed.)
    Users of highly-configurable software systems often want to optimize a particular objective such as improving a functional outcome or increasing system performance. One approach is to use an evolutionary algorithm. However, many applications today are data-driven, meaning they depend on inputs or data which can be complex and varied. Hence, a search needs to be run (and re-run) for all inputs, making optimization a heavy-weight and potentially impractical process. In this paper, we explore this issue on a data-driven highly-configurable scientific application. We build an exhaustive database containing 3,000 configurations and 10,000 inputs, leading to almost 100 million records as our oracle, and then run a genetic algorithm individually on each of the 10,000 inputs. We ask if (1) a genetic algorithm can find configurations to improve functional objectives; (2) whether patterns of best configurations over all input data emerge; and (3) if we can we use sampling to approximate the results. We find that the original (default) configuration is best only 34% of the time, while clear patterns emerge of other best configurations. Out of 3,000 possible configurations, only 112 distinct configurations achieve the optimal result at least once across all 10,000 inputs, suggesting the potential for lighter weight optimization approaches. We show that sampling of the input data finds similar patterns at a lower cost. 
    more » « less
  2. We examine a novel setting in which two parties have partial knowledge of the elements that make up a Markov Decision Process (MDP) and must cooperate to compute and execute an optimal policy for the problem constructed from those elements. This situation arises when one party wants to give a robot some task, but does not wish to divulge those details to a second party-while the second party possesses sensitive data about the robot's dynamics (information needed for planning). Both parties want the robot to perform the task successfully, but neither is willing to disclose any more information than is absolutely necessary. We utilize techniques from secure multi-party computation, combining primitives and algorithms to construct protocols that can compute an optimal policy while ensuring that the policy remains opaque by being split across both parties. To execute a split policy, we also give a protocol that enables the robot to determine what actions to trigger, while the second party guards against attempts to probe for information inconsistent with the policy's prescribed execution. In order to improve scalability, we find that basis functions and constraint sampling methods are useful in forming effective approximate MDPs. We report simulation results examining performance and precision, and assess the scaling properties of our Python implementation. We also describe a hardware proof-of-feasibility implementation using inexpensive physical robots, which, being a small-scale instance, can be solved directly. 
    more » « less
  3. Abstract

    Understanding how risk factors affect populations across their annual cycle is a major challenge for conserving migratory birds. For example, disease outbreaks may happen on the breeding grounds, the wintering grounds, or during migration and are expected to accelerate under climate change. The ability to identify the geographic origins of impacted individuals, especially outside of breeding areas, might make it possible to predict demographic trends and inform conservation decision‐making. However, such an effort is made more challenging by the degraded state of carcasses and resulting low quality of DNA available. Here, we describe a rapid and low‐cost approach for identifying the origins of birds sampled across their annual cycle that is robust even when DNA quality is poor. We illustrate the approach in the common loon (Gavia immer), an iconic migratory aquatic bird that is under increasing threat on both its breeding and wintering areas. Using 300 samples collected from across the breeding range, we develop a panel of 158 single‐nucleotide polymorphisms (SNP) loci with divergent allele frequencies across six genetic subpopulations. We use this SNP panel to identify the breeding grounds for 142 live nonbreeding individuals and carcasses. For example, genetic assignment of loons sampled during botulism outbreaks in parts of the Great Lakes provides evidence for the significant role the lakes play as migratory stopover areas for loons that breed across wide swaths of Canada, and highlights the vulnerability of a large segment of the breeding population to botulism outbreaks that are occurring in the Great Lakes with increasing frequency. Our results illustrate that the use of SNP panels to identify breeding origins of carcasses collected during the nonbreeding season can improve our understanding of the population‐specific impacts of mortality from disease and anthropogenic stressors, ultimately allowing more effective management.

     
    more » « less
  4. Abstract Background

    Calls to improve learning in science, technology, engineering, and mathematics (STEM), and particularly engineering, present significant challenges for school systems. Partnerships among engineering industry, universities, and school systems to support learning appear promising, but current work is limited in its conclusions because it lacks a strong connection to theoretical work in interorganizational collaboration.

    Purpose/Hypothesis

    This study aims to reflect more critically on the process of how organizations build relationships to address the following research question: In a public–private partnership to integrate engineering into middle school science curriculum, how do stakeholder characterizations of the collaborative process align with existing frameworks of interorganizational collaboration?

    Design/Method

    This qualitative, embedded multiple case study considered in‐depth pre‐ and post‐year interviews with teachers, administrators, industry, and university personnel during the first year of the Partnering with Educators and Engineers in Rural Schools (PEERS) program. Transcripts were analyzed using a framework of interorganizational collaboration operationalized for our context.

    Results

    Results provide insights into stakeholder perceptions of collaborative processes in the first year of the PEERS program across dimensions of collaboration. These dimensions mapped to three central discussion points with relevance for school–university–industry partnerships: school collaboration as an emergent and negotiated process, tension in collaborating across organizations, and fair share in collaborating toward a social goal.

    Conclusions

    Taking a macro‐level look at the collaborative processes involved enabled us to develop implications for collaborative stakeholders to be intentional about designing for future success. By systematically applying a framework of collaboration and capitalizing on the rich situational findings possible through a qualitative approach, we shift our understanding of collaborative processes in school–university–industry partnerships for engineering education and contribute to the development of collaboration theory.

     
    more » « less
  5. null (Ed.)
    Security engineers and researchers use their disparate knowledge and discretion to identify malware present in a system. Sometimes, they may also use previously extracted knowledge and available Cyber Threat Intelligence (CTI) about known attacks to establish a pattern. To aid in this process, they need knowledge about malware behavior mapped to the available CTI. Such mappings enrich our representations and also helps verify the information. In this paper, we describe how we retrieve malware samples and execute them in a local system. The tracked malware behavior is represented in our Cybersecurity Knowledge Graph (CKG), so that a security professional can reason with behavioral information present in the graph and draw parallels with that information. We also merge the behavioral information with knowledge extracted from the text in CTI sources like technical reports and blogs about the same malware to improve the reasoning capabilities of our CKG significantly. 
    more » « less