skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: Subdivisions and crossroads: Identifying hidden community structures in a data archive’s citation network
Abstract

Data archives are an important source of high-quality data in many fields, making them ideal sites to study data reuse. By studying data reuse through citation networks, we are able to learn how hidden research communities—those that use the same scientific data sets—are organized. This paper analyzes the community structure of an authoritative network of data sets cited in academic publications, which have been collected by a large, social science data archive: the Interuniversity Consortium for Political and Social Research (ICPSR). Through network analysis, we identified communities of social science data sets and fields of research connected through shared data use. We argue that communities of exclusive data reuse form “subdivisions” that contain valuable disciplinary resources, while data sets at a “crossroads” broadly connect research communities. Our research reveals the hidden structure of data reuse and demonstrates how interdisciplinary research communities organize around data sets as shared scientific inputs. These findings contribute new ways of describing scientific communities to understand the impacts of research data reuse.

 
more » « less
PAR ID:
10373284
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
DOI PREFIX: 10.1162
Date Published:
Journal Name:
Quantitative Science Studies
Volume:
3
Issue:
3
ISSN:
2641-3337
Page Range / eLocation ID:
p. 694-714
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    A 5‐year project to study scientific data uses in geography, starting in 1999, evolved into 20 years of research on data practices in sensor networks, environmental sciences, biology, seismology, undersea science, biomedicine, astronomy, and other fields. By emulating the “team science” approaches of the scientists studied, the UCLA Center for Knowledge Infrastructures accumulated a comprehensive collection of qualitative data about how scientists generate, manage, use, and reuse data across domains. Building upon Paul N. Edwards's model of “making global data”—collecting signals via consistent methods, technologies, and policies—to “make data global”—comparing and integrating those data, the research team has managed and exploited these data as a collaborative resource. This article reflects on the social, technical, organizational, economic, and policy challenges the team has encountered in creating new knowledge from data old and new. We reflect on continuity over generations of students and staff, transitions between grants, transfer of legacy data between software tools, research methods, and the role of professional data managers in the social sciences.

     
    more » « less
  2. Abstract

    The impact of preserved museum specimens is transforming and increasing by three-dimensional (3D) imaging that creates high-fidelity online digital specimens. Through examples from the openVertebrate (oVert) Thematic Collections Network, we describe how we created a digitization community dedicated to the shared vision of making 3D data of specimens available and the impact of these data on a broad audience of scientists, students, teachers, artists, and more. High-fidelity digital 3D models allow people from multiple communities to simultaneously access and use scientific specimens. Based on our multiyear, multi-institution project, we identify significant technological and social hurdles that remain for fully realizing the potential impact of digital 3D specimens.

     
    more » « less
  3. Adoption of data and compute-intensive research in geosciences is hindered by the same social and technological reasons as other science disciplines - we're humans after all. As a result, many of the new opportunities to advance science in today's rapidly evolving technology landscape are not approachable by domain geoscientists. Organizations must acknowledge and actively mitigate these intrinsic biases and knowledge gaps in their users and staff. Over the past ten years, CyVerse (www.cyverse.org) has carried out the mission "to design, deploy, and expand a national cyberinfrastructure for life sciences research, and to train scientists in its use." During this time, CyVerse has supported and enabled transdisciplinary collaborations across institutions and communities, overseen many successes, and encountered failures. Our lessons learned in user engagement, both social and technical, are germane to the problems facing the geoscience community today. A key element of overcoming social barriers is to set up an effective education, outreach, and training (EOT) team to drive initial adoption as well as continued use. A strong EOT group can reach new users, particularly those in under-represented communities, reduce power distance relationships, and mitigate users' uncertainty avoidance toward adopting new technology. Timely user support across the life of a project, based on mutual respect between the developers' and researchers' different skill sets, is critical to successful collaboration. Without support, users become frustrated and abandon research questions whose technical issues require solutions that are 'simple' from a developer's perspective, but are unknown by the scientist. At CyVerse, we have found there is no one solution that fits all research challenges. Our strategy has been to maintain a system of systems (SoS) where users can choose 'lego-blocks' to build a solution that matches their problem. This SoS ideology has allowed CyVerse users to extend and scale workflows without becoming entangled in problems which reduce productivity and slow scientific discovery. Likewise, CyVerse addresses the handling of data through its entire lifecycle, from creation to publication to future reuse, supporting community driven big data projects and individual researchers. 
    more » « less
  4. Social network data are complex and dependent data. At the macro-level, social networks often exhibit clustering in the sense that social networks consist of communities; and at the micro-level, social networks often exhibit complex network features such as transitivity within communities. Modeling real-world social networks requires modeling both the macro- and micro-level, but many existing models focus on one of them while neglecting the other. In recent work, [28] introduced a class of Exponential Random Graph Models (ERGMs) capturing community structure as well as microlevel features within communities. While attractive, existing approaches to estimating ERGMs with community structure are not scalable. We propose here a scalable two-stage strategy to estimate an important class of ERGMs with community structure, which induces transitivity within communities. At the first stage, we use an approximate model, called working model, to estimate the community structure. At the second stage, we use ERGMs with geometrically weighted dyadwise and edgewise shared partner terms to capture refined forms of transitivity within communities. We use simulations to demonstrate the performance of the two-stage strategy in terms of the estimated community structure. In addition, we show that the estimated ERGMs with geometrically weighted dyadwise and edgewise shared partner terms within communities outperform the working model in terms of goodness-of-fit. Last, but not least, we present an application to high-resolution human contact network data. 
    more » « less
  5. Our ability to visualize and quantify the internal structures of objects via computed tomography (CT) has fundamentally transformed science. As tomographic tools have become more broadly accessible, researchers across diverse disciplines have embraced the ability to investigate the 3D structure-function relationships of an enormous array of items. Whether studying organismal biology, animal models for human health, iterative manufacturing techniques, experimental medical devices, engineering structures, geological and planetary samples, prehistoric artifacts, or fossilized organisms, computed tomography has led to extensive methodological and basic sciences advances and is now a core element in science, technology, engineering, and mathematics (STEM) research and outreach toolkits. Tomorrow's scientific progress is built upon today's innovations. In our data-rich world, this requires access not only to publications but also to supporting data. Reliance on proprietary technologies, combined with the varied objectives of diverse research groups, has resulted in a fragmented tomography-imaging landscape, one that is functional at the individual lab level yet lacks the standardization needed to support efficient and equitable exchange and reuse of data. Developing standards and pipelines for the creation of new and future data, which can also be applied to existing datasets is a challenge that becomes increasingly difficult as the amount and diversity of legacy data grows. Global networks of CT users have proved an effective approach to addressing this kind of multifaceted challenge across a range of fields. Here we describe ongoing efforts to address barriers to recently proposed FAIR (Findability, Accessibility, Interoperability, Reuse) and open science principles by assembling interested parties from research and education communities, industry, publishers, and data repositories to approach these issues jointly in a focused, efficient, and practical way. By outlining the benefits of networks, generally, and drawing on examples from efforts by the Non-Clinical Tomography Users Research Network (NoCTURN), specifically, we illustrate how standardization of data and metadata for reuse can foster interdisciplinary collaborations and create new opportunities for future-looking, large-scale data initiatives. 
    more » « less