The growing adoption of data analytics platforms and machine learning-based solutions for decision-makers creates a significant demand for datasets, which explains the appearance of data markets. In a well-functioning data market, sellers share data in exchange for money, and buyers pay for datasets that help them solve problems. The market raises sufficient money to compensate sellers and incentivize them to keep sharing datasets. This low-friction matching of sellers and buyers distributes the value of data among participants. But designing online data markets is challenging because they must account for the strategic behavior of participants. In this paper, we introduce techniques to protect data markets from strategic participants, even when the asset traded is data. We combine those techniques into a pricing algorithm specifically designed to trade data. The evaluation includes a user study and extensive simulations. Together, the evaluation demonstrates how participants strategize and the effectiveness of our techniques.
more »
« less
Data-Sharing Markets: Model, Protocol, and Algorithms to Incentivize the Formation of Data-Sharing Consortia
Organizations that would mutually benefit from pooling their data are otherwise wary of sharing. This is because sharing data is costly—in time and effort—and, at the same time, the benefits of sharing are not clear. Without a clear cost-benefit analysis, participants default in not sharing. As a consequence, many opportunities to create valuable data-sharing consortia never materialize and the value of data remains locked. We introduce a new sharing model, market protocol, and algorithms to incentivize the creation of data-sharing markets. The combined contributions of this paper, which we call DSC, incentivize the creation of data-sharing markets that unleash the value of data for its participants. The sharing model introduces two incentives; one that guarantees that participating is better than not doing so, and another that compensates participants according to how valuable is their data. Because operating the consortia is costly, we are also concerned with ensuring its operation is sustainable: we design a protocol that ensures that valuable data-sharing consortia form when it is sustainable. We introduce algorithms to elicit the value of data from the participants, which is used to: first, cover the costs of operating the consortia, and second compensate data contributions. For the latter, we challenge the use of the Shapley value to allocate revenue. We offer analytical and empirical evidence for this and introduce an alternative method that compensates participants better and leads to the formation of more data-sharing consortia.
more »
« less
- Award ID(s):
- 2040718
- PAR ID:
- 10427795
- Date Published:
- Journal Name:
- Proceedings ACMSIGMOD International Conference on Management of Data
- ISSN:
- 0730-8078
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Pooling and sharing data increases and distributes its value. But since data cannot be revoked once shared, scenarios that require controlled release of data for regulatory, privacy, and legal reasons default to not sharing. Because selectively controlling what data to release is difficult, the few data-sharing consortia that exist are often built around data-sharing agreements resulting from long and tedious one-off negotiations. We introduce Data Station, a data escrow designed to enable the formation of data-sharing consortia. Data owners share data with the escrow knowing it will not be released without their consent. Data users delegate their computation to the escrow. The data escrow relies on delegated computation to execute queries without releasing the data first. Data Station leverages hardware enclaves to generate trust among participants, and exploits the centralization of data and computation to generate an audit log. We evaluate Data Station on machine learning and data-sharing applications while running on an untrusted intermediary. In addition to important qualitative advantages, we show that Data Station: i) outperforms federated learning baselines in accuracy and runtime for the machine learning application; ii) is orders of magnitude faster than alternative secure data-sharing frameworks; and iii) introduces small overhead on the critical path.more » « less
-
Abstract The Convention on Biological Diversity and the Nagoya Protocol on Access and Benefit-Sharing provide an international legal framework that aims to prevent misappropriation of the genetic resources of a country and ensure the fair and equitable sharing of benefits arising from their use. The legislation was negotiated at the behest of lower-income, biodiverse countries to ensure that benefits derived from research and development of genetic resources from within their jurisdictions were equitably returned and could thereby incentivize conservation and sustainable use of biodiversity. Despite good intentions, however, rapid adoption of access and benefit-sharing measures at the national level, often without participatory strategic planning, has hampered noncommercial, international collaborative genetic research with counterproductive consequences for biodiversity conservation and sustainable use. We outline how current implementation of the Convention of Biological Diversity and the Nagoya Protocol affect noncommercial research, such as that conducted in many disciplines in biology, including mammalogy. We use a case study from Brazil, an early adopter, to illustrate some current challenges and highlight downstream consequences for emerging pathogen research and public health. Most emerging pathogens colonize or jump to humans from nonhuman mammals, but noncommercial research in zoonotic diseases is complicated by potential commercial applications. Last, we identify proactive ways for the mammalogical community to engage with the Convention on Biological Diversity and the Nagoya Protocol, through sharing of nonmonetary benefits and working with local natural history collections. Leveraging international scientific societies to collectively communicate the needs of biodiversity science to policy makers will be critical to ensuring that appropriate accommodations are negotiated for noncommercial research.more » « less
-
With its focus on value creation and value capture, open innovation research explicitly or implicitly examines the competitive impacts of collaboration. However, to date such research has not considered the effects of a blockbuster industry structure upon open innovation. Here, we examine a particular form of multilateral collaboration, the open R&D consortium, in which the results from collaboration are allowed to spill over to members and nonmembers alike. We do so in the context of the pharmaceutical industry, a stable but fragmented industry defined by the ongoing search for blockbuster hits protected by strong appropriability. Using a novel data set, we identify 141 such consortia that involve two or more of the 30 largest pharma firms. We show that firms financially support such consortia, in part, because their value creation activities benefit members without disrupting the value capture or other aspects of the incumbent industry structure. We discuss the implications of these findings for research on multilateral collaboration in blockbuster industries, and open innovation more generally.more » « less
-
Many of the datasets that could contribute to solutions for current public problems are proprietary and reside outside of government agencies. Accelerating data sharing and collaboration between those who hold valuable data and those able to deliver solutions is key to generating public value from private data. There is still a limited body of literature, however, that addresses data sharing and collaboration between private and public organizations. Using a case study of food traceability from local farms to institutions, this paper contributes to this emerging field by identifying challenges and incentives in data sharing among different types of organizations. In particular, our goal is to study how small farms and institutional buyers can be incentivized to share their data in a way that contributes to food safety, public health, and other societal goals. Our findings demonstrate that initiatives which can show the benefits of having a whole-chain food traceability system, have clear policies and regulations, and opportunities for participation in training activities are key incentives.more » « less
An official website of the United States government

