skip to main content


Title: Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing
Data too sensitive to be "open" for analysis and re-purposing typically remains "closed" as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legal-technical approach provided by a third-party public-private data trust designed to balance these competing interests. Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage, and c) removal of biases that could reinforce discriminatory policies, all while maintaining fidelity to the original data. We find that using synthetic data in conjunction with strong legal protections over raw data strikes a balance between transparency, proprietorship, privacy, and research objectives. This legal-technical framework can form the basis for data trusts in a variety of contexts.  more » « less
Award ID(s):
1740996
NSF-PAR ID:
10111608
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
FAT*
Page Range / eLocation ID:
191 to 200
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse today. The opaque nature of the algorithms these platforms use to curate content raises societal questions. Prior studies have used black-box methods led by experts or collaborative audits driven by everyday users to show that these algorithms can lead to biased or discriminatory outcomes. However, existing auditing methods face fundamental limitations because they function independent of the platforms. Concerns of potential harmful outcomes have prompted proposal of legislation in both the U.S. and the E.U. to mandate a new form of auditing where vetted external researchers get privileged access to social media platforms. Unfortunately, to date there have been no concrete technical proposals to provide such auditing, because auditing at scale risks disclosure of users' private data and platforms' proprietary algorithms. We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation. The first contribution of our work is to enumerate the challenges and the limitations of existing auditing methods to implement these policies at scale. Second, we suggest that limited, privileged access to relevance estimators is the key to enabling generalizable platform-supported auditing of social media platforms by external researchers. Third, we show platform-supported auditing need not risk user privacy nor disclosure of platforms' business interests by proposing an auditing framework that protects against these risks. For a particular fairness metric, we show that ensuring privacy imposes only a small constant factor increase (6.34x as an upper bound, and 4× for typical parameters) in the number of samples required for accurate auditing. Our technical contributions, combined with ongoing legal and policy efforts, can enable public oversight into how social media platforms affect individuals and society by moving past the privacy-vs-transparency hurdle. 
    more » « less
  2. The Association of Public and Land-grant Universities (APLU) and the Association of American Universities (AAU), with support from the National Science Foundation, convened the Accelerating Public Access to Research Data Workshop on October 29-30, 2018. The purpose of the workshop was to provide a venue for learning, sharing, and planning (campus roadmaps) to support research universities as they create and implement institutional and cross-institutional strategies and systems to provide public access to research data. It also provided a forum for participants to hear from federal agencies concerning their current activities and plans regrading data access. To date, institutional efforts to provide public access to research data have lacked coordination. Additionally, a long-term multi-institutional strategy for data access has been slow to develop due to the complexities of data management and the decentralized nature of the research enterprise. Access to data presents a particularly difficult challenge given the technical knowledge required and the variation in data creation and use across disciplines. While providing the public with access to tax-payer-funded research data is challenging, it will ultimately speed the pace of scientific advancement and innovation and strengthen research integrity. The workshop and report, together with prior and subsequent engagement by APLU and AAU, will help to accelerate public access to research data. 
    more » « less
  3. Organized surveillance, especially by governments poses a major challenge to individual privacy, due to the resources governments have at their disposal, and the possibility of overreach. Given the impact of invasive monitoring, in most democratic countries, government surveillance is, in theory, monitored and subject to public oversight to guard against violations. In practice, there is a difficult fine balance between safeguarding individual’s privacy rights and not diluting the efficacy of national security investigations, as exemplified by reports on government surveillance programs that have caused public controversy, and have been challenged by civil and privacy rights organizations. Surveillance is generally conducted through a mechanism where federal agencies obtain a warrant from a federal or state judge (e.g., the US FISA court, Supreme Court in Canada) to subpoena a company or service-provider (e.g., Google, Microsoft) for their customers’ data. The courts provide annual statistics on the requests (accepted, rejected), while the companies provide annual transparency reports for public auditing. However, in practice, the statistical information provided by the courts and companies is at a very high level, generic, is released after-the-fact, and is inadequate for auditing the operations. Often this is attributed to the lack of scalable mechanisms for reporting and transparent auditing. In this paper, we present SAMPL, a novel auditing framework which leverages cryptographic mechanisms, such as zero knowledge proofs, Pedersen commitments, Merkle trees, and public ledgers to create a scalable mechanism for auditing electronic surveillance processes involving multiple actors. SAMPL is the first framework that can identify the actors (e.g., agencies and companies) that violate the purview of the court orders. We experimentally demonstrate the scalability for SAMPL for handling concurrent monitoring processes without undermining their secrecy and auditability. 
    more » « less
  4. Government agencies collect and manage a wide range of ever-growing datasets. While such data has the potential to support research and evidence-based policy making, there are concerns that the dissemination of such data could infringe upon the privacy of the individuals (or organizations) from whom such data was collected. To appraise the current state of data sharing, as well as learn about opportunities for stimulating such sharing at a faster pace, a virtual workshop was held on May 21st and 26th, 2021, sponsored by the National Science Foundation and National Institute of Standards and Technologies, where a multinational collection of researchers and practitioners were brought together to discuss their experiences and learn about recently developed technologies for managing privacy while sharing data. The workshop specifically focused on challenges and successes in government data sharing at various levels. The first day focused on successful examples of new technology applied to sharing of public data, including formal privacy techniques, synthetic data, and cryptographic approaches. Day two emphasized brainstorming sessions on some of the challenges and directions to address them. 
    more » « less
  5. This article presents a new framework for realizing the value of linked data understood as a strategic asset and increasingly necessary form of infrastructure for policy-making and research in many domains. We outline a framework, the ‘data mosaic’ approach, which combines socio-organizational and technical aspects. After demonstrating the value of linked data, we highlight key concepts and dangers for community-developed data infrastructures. We concretize the framework in the context of work on science and innovation generally. Next we consider how a new partnership to link federal survey data, university data, and a range of public and proprietary data represents a concrete step toward building and sustaining a valuable data mosaic. We discuss technical issues surrounding linked data but emphasize that linking data involves addressing the varied concerns of wide-ranging data holders, including privacy, confidentiality, and security, as well as ensuring that all parties receive value from participating. The core of successful data mosaic projects, we contend, is as much institutional and organizational as it is technical. As such, sustained efforts to fully engage and develop diverse, innovative communities are essential. 
    more » « less