skip to main content


Title: SDR Querier: A Visual Querying Framework for Cross-National Survey Data Recycling
Public opinion surveys constitute a widespread, powerful tool to study peoples’ attitudes and behaviors from comparative perspectives. However, even global surveys can have limited geographic and temporal coverage, which can hinder the production of comprehensive knowledge. To expand the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in different populations and/or at different times. These harmonized datasets can be analyzed as a single source and accessed through various data portals. However, the Survey Data Recycling (SDR) research project has identified three challenges faced by social scientists when using data portals: the lack of capability to explore data in-depth or query data based on customized needs, the difficulty in efficiently identifying related data for studies, and the incapability to evaluate theoretical models using sliced data. To address these issues, the SDR research project has developed the SDR Querier, which is applied to the harmonized SDR database. The SDR Querier includes a BERT-based model that allows for customized data queries through research questions or keywords (Query-by-Question), a visual design that helps users determine the availability of harmonized data for a given research question (Query-by-Condition), and the ability to reveal the underlying relational patterns among substantive and methodological variables in the database (Query-by-Relation), aiding in the rigorous evaluation or improvement of regression models. Case studies with multiple social scientists have demonstrated the usefulness and effectiveness of the SDR Querier in addressing daily challenges.  more » « less
Award ID(s):
1738502
NSF-PAR ID:
10478644
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
IEEE Computer Society
Date Published:
Journal Name:
IEEE Transactions on Visualization and Computer Graphics
Volume:
29
Issue:
6
ISSN:
1077-2626
Page Range / eLocation ID:
2862 to 2874
Subject(s) / Keyword(s):
Data visualization, Data models, Biological system modeling, Rivers, Portals, Bit error rate, Sociology
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The SDR Database v.2.0 (SDR2) is a multi-country, multi-year database for research on political participation, social capital, and well-being. It comprises harmonized information from 23 international survey projects, covering over 4.4 million respondents from 156 countries in the period 1966 – 2017. SDR2 provides both target variables and methodological indicators that store source survey and ex-post harmonization metadata. SDR2 consists of three datasets. The MASTER file, which stores harmonized information for a total of 4,402,489 respondents. The auxiliary PLUG-SURVEY file containing controls for source data quality and a set of technical variables needed for merging this file with the MASTER file. And the PLUG-COUNTRY file, which is a dictionary of countries and territories used in the MASTER file. An overall description of the SDR2 Database, and detailed information about its datasets are available in the SDR2 documentation. SDR2 is a product of the project Survey Data Recycling: New Analytic Framework, Integrated Database, and Tools for Cross-national Social, Behavioral and Economic Research, financed by the US National Science Foundation (PTE Federal award 1738502). We thank the Ohio State University and the Institute of Philosophy and Sociology, Polish Academy of Sciences, for organizational support. 
    more » « less
  2. SDR 2.0 Cotton File: Cumulative List of Variables in the Surveys of the SDR Database is a comprehensive data dictionary, in Microsoft Excel format. Its main purpose is to facilitate the overview of 88118 variables (i.e. variable names, values, and labels) available in the original (source) data files that we retrieved automatically for harmonization purposes in the SDR Project. Information in the Cotton File comes from 215 source data files that comprise ca. 3500 national surveys administered between 1966 and 2017 in 169 countries or territories, as part of 23 international survey projects. 
    more » « less
  3. Large-scale organic data generated from newspapers, social media, television, and radio require an expertise in infrastructure management, data collection, and data processing in order to gain research value from them. We have developed text analytic research portals to help social science researchers who do not have the resources necessary to collect, store, and process these large-scale data sets. Our portals allow researchers to use an intuitive point and click interface to generate variables from large, dynamic data sets using state of the art text mining and learning methods. These timely variables constructed from noisy text can then be used to advance social science research in areas such as political science, economics, public health, and psychology research. 
    more » « less
  4. null (Ed.)
    This demonstration showcases Chestnut, a data layout generator for in-memory object-oriented database applications. Given an application and a memory budget, Chestnut generates a customized in-memory data layout and the corresponding query plans that are specialized for the application queries. Our demo will let users design and improve simple web applications using Chestnut. Users can view the Chestnut-generated data layouts using a custom visualization system, which will allow users to see how the application parameters affect Chestnut's design. Finally, users will be able to run queries generated by the application via the customized query plans generated by Chestnut or traditional relational query engines, and can compare the results and observe the speedup achieved by the Chestnut-generated query plans. 
    more » « less
  5. Every project in digital and computational history of science starts with the collection of data. Depending on the research project, subject of study, and other factors, data can be comprised of a variety of different types such as full-texts, images, audio, video, or bibliographic metadata. Often publications and project reports describe a project’s results and the employed methods and algorithms, but few publications discuss the challenges of the initial data collection process or how it fits into the overall research data life cycle. In this paper, we discuss a concrete research data life cycle and take a look at the difficulties it involves. Furthermore, we explore the strategies and challenges of data collection, and the question of comparability of datasets. 
    more » « less