skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 31, 2026

Title: What is the Value of Data? A Theory and Systematization
Data powers economies, shapes societies, and fuels decision-making, yet its value remains poorly understood. Despite its centrality, we lack a unified framework for defining, measuring, and reasoning about data’s worth. This article develops a theory and systematization of the value of data—explaining why, how, and when data generates value. We distinguish data from documents, separate objective value from subjective judgments, and identify key dimensions of data’s worth. Our framework reconciles disparate notions of information, knowledge, and utility, offering insights that validate known principles while uncovering new opportunities to extract value from data. More than a taxonomy, this work provides a conceptual foundation for integrating perspectives from computer science, economics, and beyond. The conceptual foundation clarifies data’s role in technology, markets, and governance, advancing our ability to systematically understand and harness its value.  more » « less
Award ID(s):
2340034
PAR ID:
10618456
Author(s) / Creator(s):
Publisher / Repository:
ACM Digital Library
Date Published:
Journal Name:
ACM / IMS Journal of Data Science
Volume:
2
Issue:
1
ISSN:
2831-3194
Page Range / eLocation ID:
1 to 25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data valuation is essential for quantifying data’s worth, aiding in assessing data quality and determining fair compensation. While existing data valuation methods have proven effective in evaluating the value of Euclidean data, they face limitations when applied to the increasingly popular graph-structured data. Particularly, graph data valuation introduces unique challenges, primarily stemming from the intricate dependencies among nodes and the growth in value estimation costs. To address the challenging problem of graph data valuation, we put forth an innovative solution, Precedence-Constrained Winter (PC-Winter) Value, to account for the complex graph structure. Furthermore, we develop a variety of strategies to address the computational challenges and enable efficient approximation of PC-Winter. Extensive experiments demonstrate the effectiveness of PC-Winter across diverse datasets and tasks. 
    more » « less
  2. Abstract We propose a conceptual framework for STEM education that is centered around justice for minoritized groups. Justice‐centered STEM education engages all students in multiple STEM subjects, including data science and computer science, to explain and design solutions to societal challenges disproportionately impacting minoritized groups. We articulate the affordances of justice‐centered STEM education for one minoritized student group that has been traditionally denied meaningful STEM learning: multilingual learners (MLs). Justice‐centered STEM education with MLs leverages the assets they bring to STEM learning, including their transnational experiences and knowledge as well as their rich repertoire of meaning‐making resources. In this position paper, we propose our conceptual framework to chart a new research agenda on justice‐centered STEM education to address societal challenges with all students, especially MLs. Our conceptual framework incorporates four interrelated components by leveraging the convergence of multiple STEM disciplines to promote justice‐centered STEM education with MLs: (a) societal challenges in science education, (b) justice‐centered data science education, (c) justice‐centered computer science education, and (d) justice‐centered engineering education. The article illustrates our conceptual framework using the case of the COVID‐19 pandemic, which has presented an unprecedented societal challenge but also an unprecedented opportunity to cultivate MLs' assets toward promoting justice in STEM education. Finally, we describe how our conceptual framework establishes the foundation for a new research agenda that addresses increasingly complex, prevalent, and intractable societal challenges disproportionately impacting minoritized groups. We also consider broader issues pertinent to our conceptual framework, including the social and emotional impacts of societal challenges; the growth of science denial and misinformation; and factors associated with politics, ideology, and religion. Justice‐centered STEM education contributes to solving societal challenges that K‐12 students currently face while preparing them to shape a more just society. 
    more » « less
  3. Interest in communicative visualization has been growing in recent years. However, despite this growth, a solid theoretical foundation has not been established. In this paper I examine the role that conceptual metaphor theory may contribute to such a foundation. I present a brief background on conceptual metaphor theory, including a discussion on image schemas, conceptual metaphors, and embodied cognition. I speculate on the role of conceptual metaphor for explaining and (re)designing communicative visualizations by providing and discussing a small set of examples as anecdotal evidence of the possible value of conceptual metaphor. Finally, I discuss implications of conceptual metaphor theory for communicative visualization design and present some ideas for future research on this topic. 
    more » « less
  4. Lawrence, Neil (Ed.)
    Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data’s multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP’s utility in graph classification tasks, showing its effectiveness. Results reveal EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets. 
    more » « less
  5. Abstract Computational methods from reinforcement learning have shown promise in inferring treatment strategies for hypotension management and other clinical decision-making challenges. Unfortunately, the resulting models are often difficult for clinicians to interpret, making clinical inspection and validation of these computationally derived strategies challenging in advance of deployment. In this work, we develop a general framework for identifying succinct sets of clinical contexts in which clinicians make very different treatment choices, tracing the effects of those choices, and inferring a set of recommendations for those specific contexts. By focusing on these few key decision points, our framework produces succinct, interpretable treatment strategies that can each be easily visualized and verified by clinical experts. This interrogation process allows clinicians to leverage the model’s use of historical data in tandem with their own expertise to determine which recommendations are worth investigating further e.g. at the bedside. We demonstrate the value of this approach via application to hypotension management in the ICU, an area with critical implications for patient outcomes that lacks data-driven individualized treatment strategies; that said, our framework has broad implications on how to use computational methods to assist with decision-making challenges on a wide range of clinical domains. 
    more » « less