skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Linked Data Mosaic for Policy-Relevant Research on Science and Innovation: Value, Transparency, Rigor, and Community
This article presents a new framework for realizing the value of linked data understood as a strategic asset and increasingly necessary form of infrastructure for policy-making and research in many domains. We outline a framework, the ‘data mosaic’ approach, which combines socio-organizational and technical aspects. After demonstrating the value of linked data, we highlight key concepts and dangers for community-developed data infrastructures. We concretize the framework in the context of work on science and innovation generally. Next we consider how a new partnership to link federal survey data, university data, and a range of public and proprietary data represents a concrete step toward building and sustaining a valuable data mosaic. We discuss technical issues surrounding linked data but emphasize that linking data involves addressing the varied concerns of wide-ranging data holders, including privacy, confidentiality, and security, as well as ensuring that all parties receive value from participating. The core of successful data mosaic projects, we contend, is as much institutional and organizational as it is technical. As such, sustained efforts to fully engage and develop diverse, innovative communities are essential.  more » « less
Award ID(s):
1760544 2100234 1937251
PAR ID:
10369098
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Harvard data science review
Volume:
4
Issue:
2
ISSN:
2644-2353
Page Range / eLocation ID:
https://hdsr.mitpress.mit.edu/pub/u073rjxs/release
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we present the Systems Engineering Initiative for Student Success (SEISS) framework we are developing for enabling educational organizations to scan, evaluate and transform their operations to achieve their diversity, equity, and inclusion goals in student recruitment, retention, and graduation. The underlying structure and logic in our SEISS framework is that an organization such as a college of engineering is a sociotechnical system (STS) consisting of a social subsystem and a technical subsystem. The social subsystem consists of people, their roles and is a model of who talks to whom about what. The technical subsystem consists of all the activities, programs, policies, and operations that help the organization achieve its goals. In a sociotechnical system, the social and technical subsystems are interdependent in their functioning, and they must be jointly optimized from an organizational design perspective. Our SEISS framework which views a college or a similar organizational unit as a sociotechnical system lends the organizational designer a unique systems lens with which to view, analyze and design the operations and organize the capacities and resources in the college. The systems lens views an organizational unit, its sub-systems, components, and its corresponding capacities not in isolation, but as entities that interact with each other. With support from an NSF IUSE grant, we have been developing the SEISS framework and have piloted the framework in a predominantly white college of engineering to identify existing and potential technical and social system capacities for underrepresented minority (URM) students to succeed in the college. Preliminary results from our qualitative analyses of URM student interviews reveal the utility of the SEISS framework and the STS lens in unearthing the barriers and enablers for these students in the social and technical subsystems in the college. We also model the interactions between the social and technical subsystem elements in the SEISS framework, revealing latent opportunities for leveraging the connections between the social and technical subsystem capacities and resources. 
    more » « less
  2. null (Ed.)
    In May 2020, the New York City (NYC) Mayor’s Office of Climate Resiliency (MOCR) began convening bi-weekly discussions, called the Rapid Research and Assessment (RRA) Series, between City staff and external experts in science, policy, design, engineering, communications, and planning. The goal was to rapidly develop authoritative, actionable information to help integrate resiliency into the City’s COVID response efforts. The situation in NYC is not uncommon. Extreme events often require government officials, practitioners, and citizens to call upon multiple forms of scientific and technical assistance from rapid data collection to expert elicitation, each spanning more or less involved engagement. We compare the RRA to similar rapid assessment efforts and reflect on the nature of the RRA and similar efforts to exchange and co-produce knowledge. The RRA took up topics on social cohesion, risk communication, resilient and healthy buildings, and engagement, in many cases strengthening confidence in what was already known but also refining the existing knowledge in ways that can be helpful as the pandemic unfolds. Researchers also learned from each other ways to be supportive of the City of New York and MOCR in the future. The RRA network will continue to deepen, continue to co-produce actionable climate knowledge, and continue to value organizational sensemaking as a usable climate service, particularly in highly uncertain times. Given the complex, rare, and, in many cases, unfamiliar context of COVID-19, we argue that organizational sensemaking is a usable climate service. 
    more » « less
  3. Recent CSCW research on the collaborative design and development of research infrastructures for the natural sciences has increasingly focused on the challenges of open data sharing. This qualitative study describes and analyzes how multidisciplinary, geographically distributed ocean scientists are integrating highly diverse data as part of an effort to develop a new research infrastructure to advance science. This paper identifies different kinds of coordination that are necessary to align processes of data collection, production, and analysis. Some of the hard work to integrate data is undertaken before data integration can even become a technical problem. After data integration becomes a technical problem, social and organizational means continue to be critical for resolving differences in assumptions, methods, practices, and priorities. This work calls attention to the diversity of coordinative, social, and organizational practices and concerns that are needed to integrate data and also how, in highly innovative work, the process of integrating data also helps to define scientific problem spaces themselves. 
    more » « less
  4. Cache systems are widely used to speed up data retrieving. Modern HPC, data analytics, and AI/ML workloads generate vast, multi-dimensional datasets, and those data are accessed via complex queries. However, the probability of requesting the exact same data across different queries is low, leading to limited performance improvement when a traditional key-value cache is applied. In this paper, we present Mosaic-Cache, a proactive and general caching framework that enables applications with efficient partial overlapped data reuse through novel overlap-aware cache interfaces for fast content-level reuse. The core components include a metadata manager leveraging customizable indexing for fast overlap lookups, an adaptive fetch planner for dynamic cache-to-storage decisions, and an async merger to reduce cache fragmentation and redundancy. Evaluations on real-world HPC datasets show that Mosaic-Cache improves overall performance by up to 4.1× over traditional key-value-based cache while adding minimal overhead in worst-case scenarios. 
    more » « less
  5. We study the relationship between Web users and service providers, taking a sociotechnical approach and focusing particularly (but not exclusively) on privacy and security of personal data. Much conventional Web-security practice seeks to protect benevolent parties, both individuals and organizations, against purely malev- olent adversaries in an effort to prevent catastrophic events such as data breaches, ransomware attacks, and denial of service. By contrast, we highlight the dynamics among the parties that much conventional security technology seeks to protect. We regard most interactions between users and providers as implicit negotiations that, like the interactions between buyers and sellers in a market- place, have both adversarial and cooperative aspects. Our goal is to rebalance these negotiations in order to give more power to users; toward that end we advocate the adoption of two techniques, one technical and one organizational. Technically, we introduce the Plat- form for Untrusted Resource Evaluation (PURE), a content-labeling framework that empowers users to make informed decisions about service providers, reduces the ability of providers to induce be- haviors that benefit them more than users, and requires minimal time and effort to use. On the organizational side, we concur with Gordon-Tapiero et al. [19] that a collective approach is necessary to rebalance the power dynamics between users and providers; in par- ticular, we suggest that the data co-op, an organizational form sug- gested by Ligett and Nissim [25] and Pentland and Hardjono [28], is a natural setting in which to deploy PURE and similar tools. 
    more » « less