Abstract Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies produce large amounts of genome‐scale genetic diversity data for wild populations, but most (87%) do not include the associated spatial and temporal metadata necessary for them to be reused in monitoring programs or for acknowledging the sovereignty of nations or Indigenous peoples. We undertook a distributed datathon to quantify the availability of these missing metadata and to test the hypothesis that their availability decays with time. We also worked to remediate missing metadata by extracting them from associated published papers, online repositories, and direct communication with authors. Starting with 848 candidate genomic data sets (reduced representation and whole genome) from the International Nucleotide Sequence Database Collaboration, we determined that 561 contained mostly samples from wild populations. We successfully restored spatiotemporal metadata for 78% of these 561 data sets (n = 440 data sets with data on 45,105 individuals from 762 species in 17 phyla). Examining papers and online repositories was much more fruitful than contacting 351 authors, who replied to our email requests 45% of the time. Overall, 23% of our email queries to authors unearthed useful metadata. The probability of retrieving spatiotemporal metadata declined significantly as age of the data set increased. There was a 13.5% yearly decrease in metadata associated with published papers or online repositories and up to a 22% yearly decrease in metadata that were only available from authors. This rapid decay in metadata availability, mirrored in studies of other types of biological data, should motivate swift updates to data‐sharing policies and researcher practices to ensure that the valuable context provided by metadata is not lost to conservation science forever. 
                        more » 
                        « less   
                    
                            
                            Improving Metadata Infrastructure for Complex Surveys: Insights from the Fragile Families Challenge
                        
                    
    
            Researchers rely on metadata systems to prepare data for analysis. As the complexity of data sets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study on the basis of the experiences of participants in the Fragile Families Challenge. The authors demonstrate how treating metadata as data (i.e., releasing comprehensive information about variables in a format amenable to both automated and manual processing) can make the task of data preparation less arduous and less error prone for all types of data analysis. The authors hope that their work will facilitate new applications of machine-learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. The authors have open-sourced the tools they created so that others can use and improve them. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1760052
- PAR ID:
- 10321873
- Date Published:
- Journal Name:
- Socius: Sociological Research for a Dynamic World
- Volume:
- 5
- ISSN:
- 2378-0231
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The ability to identify scholarly authors is central to bibliometric analysis. Efforts to disambiguate author names using algorithms or national or societal registries become less effective with increases in the number of publications from China and other nations where shared and similar names are prevalent. This work analyzes the adoption and integration of an open source, cross-national identification system, the Open Researcher and Contributor ID system (ORCID), in Web of Science metadata. Results at the article level show greater adoption, to date, of the ORCID iD in Europe as compared with Asia and the US. Focusing analysis on individual highly cited researchers with the shared Chinese surname “Wang,” results indicate wide scope for greater adoption of ORCID. The mechanisms for integrating ORCID iDs into articles also come into question in an analysis of co-authors of one particular highly cited researcher who have varying percentages of articles with ORCID iDs attached. These results suggest that systematic variations in adoption and integration of ORCID into publication metadata should be considered in any bibliometric analysis based on it.more » « less
- 
            The Fragile Families Challenge is a scientific mass collaboration designed to measure and understand the predictability of life trajectories. Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. This Special Collection includes 12 articles describing participants’ approaches to predicting these six outcomes as well as 3 articles describing methodological and procedural insights from running the Challenge. This introduction will help readers interpret the individual articles and help researchers interested in running future projects similar to the Fragile Families Challenge.more » « less
- 
            Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author’s raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge. The approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools made it possible to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on their successes and struggles, the authors conclude with recommendations to researchers and journals.more » « less
- 
            Abstract ObjectiveThis article calls on family scholars to take seriously how families are invested and divested in maintaining and reproducing cisnormativity. BackgroundFamilies can be a prime institution for the reproduction of cisnormativity. For transgender and nonbinary family members, families' investment in cisnormativity can generate ambiguous and toxic familial relations. Yet, family studies have not developed an adequate framework to examine how and why cisnormativity operates within families. MethodThe authors engage with empirical and theoretical work on gender, intersectionality, and families to examine how cisnormativity operates within family dynamics and processes. This article also focuses on work about trans people and families to capture how cisnormative processes within families affect trans people's familial relations. ResultsThe authors advance a trans family systems framework to show how families' cisgender investments and divestments shape familial processes. The concept of cisnormative compliance is introduced to capture the beliefs and practices of obedience established by family members for the purpose of reproducing cisnormativity. Family studies can move forward in studying these cisnormative processes through documenting how gender accountability shapes family dynamics, implementing new methods, furthering an intersectional analysis, and exploring complexities of space and place. ConclusionTo reimagine gender and families, family scholars need to study and foreground how cisnormativity shapes family dynamics and processes.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    