Public opinion surveys constitute a widespread, powerful tool to study peoples’ attitudes and behaviors from comparative perspectives. However, even global surveys can have limited geographic and temporal coverage, which can hinder the production of comprehensive knowledge. To expand the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in different populations and/or at different times. These harmonized datasets can be analyzed as a single source and accessed through various data portals. However, the Survey Data Recycling (SDR) research project has identified three challenges faced by social scientists when using data portals: the lack of capability to explore data in-depth or query data based on customized needs, the difficulty in efficiently identifying related data for studies, and the incapability to evaluate theoretical models using sliced data. To address these issues, the SDR research project has developed the SDR Querier, which is applied to the harmonized SDR database. The SDR Querier includes a BERT-based model that allows for customized data queries through research questions or keywords (Query-by-Question), a visual design that helps users determine the availability of harmonized data for a given research question (Query-by-Condition), and the ability to reveal the underlying relational patterns among substantive and methodological variables in the database (Query-by-Relation), aiding in the rigorous evaluation or improvement of regression models. Case studies with multiple social scientists have demonstrated the usefulness and effectiveness of the SDR Querier in addressing daily challenges.
more »
« less
SOils DAta Harmonization database (SoDaH): an open-source synthesis of soil data from research networks
This SOils DAta Harmonization (SoDaH) database is designed to bring together soil carbon data from diverse research networks into a harmonized dataset that can be used for synthesis activities and model development. The research network sources for SoDaH span different biomes and climates, encompass multiple ecosystem types, and have collected data across a range of spatial, temporal, and depth gradients. The rich data sets assembled in SoDaH consist of observations from monitoring efforts and long-term ecological experiments. The SoDaH database also incorporates related environmental covariate data pertaining to climate, vegetation, soil chemistry, and soil physical properties. The data are harmonized and aggregated using open-source code that enables a scripted, repeatable approach for soil data synthesis.
more »
« less
- Award ID(s):
- 1929393
- PAR ID:
- 10328143
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publisher / Repository:
- Environmental Data Initiative
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract. Data collected from research networks presentopportunities to test theories and develop models about factors responsiblefor the long-term persistence and vulnerability of soil organic matter(SOM). Synthesizing datasets collected by different research networkspresents opportunities to expand the ecological gradients and scientificbreadth of information available for inquiry. Synthesizing these data ischallenging, especially considering the legacy of soil data that havealready been collected and an expansion of new network science initiatives.To facilitate this effort, here we present the SOils DAta Harmonizationdatabase (SoDaH; https://lter.github.io/som-website, last access: 22 December 2020), a flexible database designed to harmonize diverse SOM datasets frommultiple research networks. SoDaH is built on several network scienceefforts in the United States, but the tools built for SoDaH aim to providean open-access resource to facilitate synthesis of soil carbon data.Moreover, SoDaH allows for individual locations to contribute results fromexperimental manipulations, repeated measurements from long-term studies,and local- to regional-scale gradients across ecosystems or landscapes.Finally, we also provide data visualization and analysis tools that can beused to query and analyze the aggregated database. The SoDaH v1.0 dataset isarchived and availableat https://doi.org/10.6073/pasta/9733f6b6d2ffd12bf126dc36a763e0b4 (Wieder et al., 2020).more » « less
-
Abstract. In the age of big data, soil data are more available and richer than ever, but – outside of a few large soil survey resources – they remain largely unusable for informing soil management and understanding Earth system processes beyond the original study.Data science has promised a fully reusable research pipeline where data from past studies are used to contextualize new findings and reanalyzed for new insight.Yet synthesis projects encounter challenges at all steps of the data reuse pipeline, including unavailable data, labor-intensive transcription of datasets, incomplete metadata, and a lack of communication between collaborators.Here, using insights from a diversity of soil, data, and climate scientists, we summarize current practices in soil data synthesis across all stages of database creation: availability, input, harmonization, curation, and publication.We then suggest new soil-focused semantic tools to improve existing data pipelines, such as ontologies, vocabulary lists, and community practices.Our goal is to provide the soil data community with an overview of current practices in soil data and where we need to go to fully leverage big data to solve soil problems in the next century.more » « less
-
Wood, V (Ed.)Abstract The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein–protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides.more » « less
-
The SDR Database v.2.0 (SDR2) is a multi-country, multi-year database for research on political participation, social capital, and well-being. It comprises harmonized information from 23 international survey projects, covering over 4.4 million respondents from 156 countries in the period 1966 – 2017. SDR2 provides both target variables and methodological indicators that store source survey and ex-post harmonization metadata. SDR2 consists of three datasets. The MASTER file, which stores harmonized information for a total of 4,402,489 respondents. The auxiliary PLUG-SURVEY file containing controls for source data quality and a set of technical variables needed for merging this file with the MASTER file. And the PLUG-COUNTRY file, which is a dictionary of countries and territories used in the MASTER file. An overall description of the SDR2 Database, and detailed information about its datasets are available in the SDR2 documentation. SDR2 is a product of the project Survey Data Recycling: New Analytic Framework, Integrated Database, and Tools for Cross-national Social, Behavioral and Economic Research, financed by the US National Science Foundation (PTE Federal award 1738502). We thank the Ohio State University and the Institute of Philosophy and Sociology, Polish Academy of Sciences, for organizational support.more » « less
An official website of the United States government
