skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing
Data too sensitive to be "open" for analysis and re-purposing typically remains "closed" as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legal-technical approach provided by a third-party public-private data trust designed to balance these competing interests. Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage, and c) removal of biases that could reinforce discriminatory policies, all while maintaining fidelity to the original data. We find that using synthetic data in conjunction with strong legal protections over raw data strikes a balance between transparency, proprietorship, privacy, and research objectives. This legal-technical framework can form the basis for data trusts in a variety of contexts.  more » « less
Award ID(s):
1740996
PAR ID:
10111608
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
FAT*
Page Range / eLocation ID:
191 to 200
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse today. The opaque nature of the algorithms these platforms use to curate content raises societal questions. Prior studies have used black-box methods led by experts or collaborative audits driven by everyday users to show that these algorithms can lead to biased or discriminatory outcomes. However, existing auditing methods face fundamental limitations because they function independent of the platforms. Concerns of potential harmful outcomes have prompted proposal of legislation in both the U.S. and the E.U. to mandate a new form of auditing where vetted external researchers get privileged access to social media platforms. Unfortunately, to date there have been no concrete technical proposals to provide such auditing, because auditing at scale risks disclosure of users' private data and platforms' proprietary algorithms. We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation. The first contribution of our work is to enumerate the challenges and the limitations of existing auditing methods to implement these policies at scale. Second, we suggest that limited, privileged access to relevance estimators is the key to enabling generalizable platform-supported auditing of social media platforms by external researchers. Third, we show platform-supported auditing need not risk user privacy nor disclosure of platforms' business interests by proposing an auditing framework that protects against these risks. For a particular fairness metric, we show that ensuring privacy imposes only a small constant factor increase (6.34x as an upper bound, and 4× for typical parameters) in the number of samples required for accurate auditing. Our technical contributions, combined with ongoing legal and policy efforts, can enable public oversight into how social media platforms affect individuals and society by moving past the privacy-vs-transparency hurdle. 
    more » « less
  2. Abstract The recent development and use of generative AI (GenAI) has signaled a significant shift in research activities such as brainstorming, proposal writing, dissemination, and even reviewing. This has raised questions about how to balance the seemingly productive uses of GenAI with ethical concerns such as authorship and copyright issues, use of biased training data, lack of transparency, and impact on user privacy. To address these concerns, many Higher Education Institutions (HEIs) have released institutional guidance for researchers. To better understand the guidance that is being provided we report findings from a thematic analysis of guidelines from thirty HEIs in the United States that are classified as R1 or “very high research activity.” We found that guidance provided to researchers: (1) asks them to refer to external sources of information such as funding agencies and publishers to keep updated and use institutional resources for training and education; (2) asks them to understand and learn about specific GenAI attributes that shape research such as predictive modeling, knowledge cutoff date, data provenance, and model limitations, and educate themselves about ethical concerns such as authorship, attribution, privacy, and intellectual property issues; and (3) includes instructions on how to acknowledge sources and disclose the use of GenAI, how to communicate effectively about their GenAI use, and alerts researchers to long term implications such as over reliance on GenAI, legal consequences, and risks to their institutions from GenAI use. Overall, guidance places the onus of compliance on individual researchers making them accountable for any lapses, thereby increasing their responsibility. 
    more » « less
  3. The Association of Public and Land-grant Universities (APLU) and the Association of American Universities (AAU), with support from the National Science Foundation, convened the Accelerating Public Access to Research Data Workshop on October 29-30, 2018. The purpose of the workshop was to provide a venue for learning, sharing, and planning (campus roadmaps) to support research universities as they create and implement institutional and cross-institutional strategies and systems to provide public access to research data. It also provided a forum for participants to hear from federal agencies concerning their current activities and plans regrading data access. To date, institutional efforts to provide public access to research data have lacked coordination. Additionally, a long-term multi-institutional strategy for data access has been slow to develop due to the complexities of data management and the decentralized nature of the research enterprise. Access to data presents a particularly difficult challenge given the technical knowledge required and the variation in data creation and use across disciplines. While providing the public with access to tax-payer-funded research data is challenging, it will ultimately speed the pace of scientific advancement and innovation and strengthen research integrity. The workshop and report, together with prior and subsequent engagement by APLU and AAU, will help to accelerate public access to research data. 
    more » « less
  4. People Search Websites, a category of data brokers, collect, catalog, monetize and often publicly display individuals' personally identifiable information (PII). We present a study of user privacy rights in 20 such websites assessing the usability of data access and data removal mechanisms. We combine insights from these two processes to determine connections between sites, such as shared access mechanisms or removal effects. We find that data access requests are mostly unsuccessful. Instead, sites cite a variety of legal exceptions or misinterpret the nature of the requests. By purchasing reports, we find that only one set of connected sites provided access to the same report they sell to customers. We leverage a multiple step removal process to investigate removal effects between suspected connected sites. In general, data removal is more streamlined than data access, but not very transparent; questions about the scope of removal and reappearance of information remain. Confirming and expanding the connections observed in prior phases, we find that four main groups are behind 14 of the sites studied, indicating the need to further catalog these connections to simplify removal. 
    more » « less
  5. When Google or the US Census Bureau publishes detailed statistics on browsing habits or neighborhood characteristics, some privacy is lost for everybody while supplying public information. To date, economists have not focused on the privacy loss inherent in data publication. In their stead, these issues have been advanced almost exclusively by computer scientists who are primarily interested in technical problems associated with protecting privacy. Economists should join the discussion, first to determine where to balance privacy protection against data quality--a social choice problem. Furthermore, economists must ensure new privacy models preserve the validity of public data for economic research. 
    more » « less