skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Introduction to the Special Collection on the Fragile Families Challenge
The Fragile Families Challenge is a scientific mass collaboration designed to measure and understand the predictability of life trajectories. Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. This Special Collection includes 12 articles describing participants’ approaches to predicting these six outcomes as well as 3 articles describing methodological and procedural insights from running the Challenge. This introduction will help readers interpret the individual articles and help researchers interested in running future projects similar to the Fragile Families Challenge.  more » « less
Award ID(s):
1760052
PAR ID:
10321872
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Socius: Sociological Research for a Dynamic World
Volume:
5
ISSN:
2378-0231
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences. 
    more » « less
  2. This paper provides a detailed examination of pre-college computing activities as reported in three Association of Computing Machinery (ACM) venues (2012-2016). Ninety-two articles describing informal learning activities were reviewed for 24 program elements (i.e., activity components, and student/instructor demographics). These 24 program elements were defined and shaped by a virtual focus group study and the articles themselves. Results indicate that the majority of authors adequately report age/grade levels of participants, number of participants, the type of activity, when the activity was offered, the tools/languages used in the activity, and whether the activity was required or elective. However, there is a deficiency in reporting many other important and foundational program elements, including contact hours of activity participants, clear learning objectives, the prior experience of participants (students and instructors), and many more. In conjunction with previous work, this paper provides recommendations to reduce these deficiencies. The Recommendations for Reporting Pre-College Computing Activities (Version 1.0) are presented to help researchers improve the quality of papers, set a standard of necessary data needed to replicate studies, and provide a basis for comparing activities and activity outcomes across multiple studies and experiences. 
    more » « less
  3. This paper provides a detailed examination of pre-college computing activities as reported in three Association of Computing Machinery (ACM) venues (2012-2016). Ninety-two articles describing informal learning activities were reviewed for 24 program elements (i.e., activity components, and student/instructor demographics). These 24 program elements were defined and shaped by a virtual focus group study and the articles themselves. Results indicate that the majority of authors adequately report age/grade levels of participants, number of participants, the type of activity, when the activity was offered, the tools/languages used in the activity, and whether the activity was required or elective. However, there is a deficiency in reporting many other important and foundational program elements, including contact hours of activity participants, clear learning objectives, the prior experience of participants (students and instructors), and many more. In conjunction with previous work, this paper provides recommendations to reduce these deficiencies. The Recommendations for Reporting Pre-College Computing Activities (Version 1.0) are presented to help researchers improve the quality of papers, set a standard of necessary data needed to replicate studies, and provide a basis for comparing activities and activity outcomes across multiple studies and experiences. 
    more » « less
  4. Researchers rely on metadata systems to prepare data for analysis. As the complexity of data sets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study on the basis of the experiences of participants in the Fragile Families Challenge. The authors demonstrate how treating metadata as data (i.e., releasing comprehensive information about variables in a format amenable to both automated and manual processing) can make the task of data preparation less arduous and less error prone for all types of data analysis. The authors hope that their work will facilitate new applications of machine-learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. The authors have open-sourced the tools they created so that others can use and improve them. 
    more » « less
  5. Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author’s raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge. The approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools made it possible to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on their successes and struggles, the authors conclude with recommendations to researchers and journals. 
    more » « less